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Dkt. No. 50425/143 
ELF3 GENE COMPOSITIONS AND METHODS 

Background 

(1) Field of the Invention 
5 The present invention generally relates to methods and compositions 

useful for determining whether a patient has cancer or is at risk for cancer. 
More specifically, the invention relates to ELF3 gene compositions that are 
associated with cancer, particularly breast cancer, and methods using those 
compositions in cancer diagnosis. 
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The diagnosis of breast cancer requires great skill by pathologists to 
15 properly classify biopsies into current pathological groupings. The proper 

interpretation of pathological findings has great consequences to patients as it 
can result in varying treatments for primary cancer. However, there remains 
confusion about the relationship between different forms of breast cancer. For 
example, there is uncertainty as to how invasive lobular cancer is different from 
20 invasive duct carcinoma. It is also not known whether all forms of invasive duct 
carcinoma are the same. 

In spite of burgeoning molecular genetic technology and widespread 
human genome sequence information, no unique genetic marker has been found 
for the most common forms of breast cancer. The BRCA I and II genes have 
25 been useful in identifying patients at risk for familial forms of breast and 
ovarian cancer, but only a small percentage of most breast cancers occur in 
patients with the BRCA abnormalities. BRCA genes can be tested from DNA 
isolated from peripheral blood but this technology is not offered routinely to 
most women with breast cancer. Gene chip technology allows scientists to look 
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for overexpression or underexpression of otherwise normal genes. Studies with 
gene chips are beginning to reveal varyious patterns of gene expression in 
breast cancer cells that do not occur with normal cells. However, gene chip 
technology is complex and expensive and is currently performed on actual 
biopsy tissue, which is not always available. 

Another genetic marker, the ELF3 gene, is overly expressed in intraductal 
carcinoma (also called ductal carcinoma in situ [DCIS]). The ELF3 protein 
belongs to the ETS family of transcription factors, which contain a helix-loop- 
helix motif that is required to bind in the major groove of DNA sequences 
centered over a conserved core GGAA/T motif, and which is important for 
HER2/neu function (Chang et al., 1997; Oettgen et al., 1997a; Tymms et al., 
1997; Andreoli et al., 1997; Choi et al., 1998; Chang et al., 1999; Oettgen et 
al., 1999; Oettgen et al., 1997b). 

The ELF3 gene, which has also been called ESE-1, ERT, jen, and ESX, is a 
member of the subfamily of ELF (E74-like-factor) genes. The human ELF3 gene 
contains 9 exons and 8 introns (Chang et al., 1999; Oettgen et al., 1999), is 
located on chromosome lq32. 1-32.2 (Oettgen et al., 1997a; Tymms et al., 
1997), and its transcribed RNA product is -5.8 kb. It is thought to be 
expressed only in epithelial cells (Chang et al., 1997; Tymms et al., 1997; 
Brembeck et al., 2000) and its expression is induced during epidermal 
differentiation. The epithelial-specific expression pattern of ELF3 is unique 
among members of the Ets family, and to date very few epithelial-specific 
transcription factors have been identified. Its DNA-binding domain, conserved 
among all ets family members, is located in exons 8 and 9 (Oettgen et al., 
1999). 

As a transcriptional regulatory gene, ELF3 overexpression or alteration 
may play a role in carcinogenesis. ELF3 mRNA is overexpressed in ductal 
carcinoma in situ (DCIS) ad.) in which there is a high incidence of HER2-neu 
amplification and overexpression (Barnes et al., 1992). Excess chromosome 1 is 
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common in breast cancer (as well as lung and prostate cancer), and ELF3 may 
be similarly amplified. 

> Currently, it is believed that DCIS is the precursor lesion of invasive duct 
carcinoma (Rosen, 2001a). DCIS apparently arises from the terminal 
5 duct-lobular unit where the cell of origin is believed to be a terminal ductal 
epithelial cell (Rosen, 2001a, Wellings, 1975). Many different forms of DCIS 
exist including comedo, cribiform, micropapillaiy and solid type (Rosen, 
2001a). Diagnoses of these forms of DCIS have been increasing in part because 
mammography has played an increasingly major role in detecting these often 

10 non-palpable tumors. As many as 43% of tumors detected mammographically 
have been DCIS (Andersson, 1984; Sigfusson et al., 1983; Tabar et al., 1984; 
Verbeek et al., 1984; Fonseca et al., 1997). Invasive duct carcinoma is believed 
to occur when the ductal carcinoma cells breech the myoepithelial basement 
membrane and invade into the stroma. Invasive duct carcinoma is often found 

15 in conjunction with a DCIS lesion (Rosen, 2001a). 

DCIS is generally distinctly different from lobular carcinoma which can 
also form both in situ-like lesions (lobular carcinoma in situ) and invasive 
lesions (invasive lobular carcinoma). Lobular carcinoma in situ arises from the 
lobular cell itself (Rosen, 2001b). Most authorities do not consider lobular 

20 carcinoma in situ as a neoplastic lesion but as an indicator of increased cellular 
activity. This increased cellular activity is associated with an increased risk of 
other forms of breast cancer notably DCIS and invasive duct carcinoma as well 
as invasive lobular carcinoma. Some authorities feel, however, that lobular 
carcinoma in situ is the precursor lesion of invasive lobular carcinoma. Lobular 

25 carcinoma in situ lesions are inconspicuous and non-palpable, are often 

multicentric, can form signet ring-like cells and are associated with a distinctive 
type of infiltration (Rosen, 2001a). Mucin can be seen in an intracytoplasmic 
location in these cells. C-adherins are absent from these lesions. The cellular 
origin of these lesions is presumed to be the lobular cell. 
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Currently there is no genetic marker present that distinguishes lesions of 
terminal duct origin from those of lobular origin. In biopsy material from 
neoplastic breasts lesions, these different cancers can be distinguished using 
some stains of mucin, cytokeratin and C-adherin, but there is no useful genetic 
5 marker that distinguishes these different cancers. 

There is thus a need for new genetic markers to identify breast cancer, 
particularly DCIS. The present invention provides such markers. 

Summary of the Invention 

Accordingly, the present invention is based on the discovery of an 
10 association between cancer and novel ELF3 gene and/or ELF3 message (mRNA) 
sequences. The novel sequences include intron retention in the mRNA, a novel 
Alu sequence in the ELF3 gene and a novel 5' untranslated region (UTR) in the 
ELF3 gene. 

Thus, in some embodiments, the present invention is directed to cDNAs 
15 of a human ELF3 gene. In these embodiments, the cDNAs comprise an intron of 
the ELF3 gene or a portion of an intron of the ELF3 gene. Vectors comprising 
the cDNA and cells transfected with those vectors are also envisioned. 

In other embodiments, the invention is directed to sets of two primers 
useful for amplifying any of the ELF3 sequences associated herein with cancer, 
20 e.g., mRNA retaining an ELF3 intron, Alu^, and the novel 5* UTR described 
herein. 

The present invention is additionally directed to isolated nucleic acids or 
mimetics comprising a sequence homologous to at least a portion of an intron of 
a human ELF3 gene. 
25 The invention is also directed to isolated nucleic acids or mimetics 

comprising a sequence at least 95% homologous to SEQ ID NO: 13 or SEQ ID 
NO:15. 

Vectors comprising any of the above nucleic acids or mimetics, and cells 
comprising those vectors, are also within the scope of the invention. 
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Additionally, the invention is directed to probes comprising any of the 
above nucleic acids or mimetics. In these embodiments, the probes further 
comprise a detectable label. 

In additional embodiments, the invention is directed to pairs of cell 
5 cultures, where each cell culture is of the same tissue type and is derived from 
cancerous mammalian tissue, and where one of the cell lines is of cancerous 
cells and the other cell line is of matched noncancerous cells. 

The present invention is also directed to methods for determining 
whether a patient has cancer or is at risk for cancer. The methods comprise 
10 evaluating whether a cell in the patient comprises an ELF3 nucleic acid 
sequence disclosed herein to be associated with cancer. These sequences 
include an ELF3 mRNA retaining at least a portion of an intron, SEQ ID NO: 15, 
andanAlu^. 

The invention is additionally directed to kits for evaluating whether a 

15 patient has cancer or is at risk for cancer. These kits comprise sets of two 
primers homologous to a portion of an ELF3 gene. The primers are useful for 
detenuining whether the patient comprises a nucleic acid sequence described 
herein as associated with cancer. These sequences include ELF3 mRNA 
retaining at least a portion of an intron, the novel ELF3 gene 5* UTR, and Alu^. 

20 The kits also comprise instructions directing the use of the primers for 
determining whether a nucleic acid sequence amplified by the primers is 
present in a nucleic acid preparation. 

In related embodiments, the invention is directed to additional kits for 
evaluating whether a patient has cancer or is at risk for cancer. These kits 

25 comprise probes useful for determining whether the patient comprises a nucleic 
acid sequence described herein as associated with cancer. These sequences 
include ELF3 mRNA retaining at least a portion of an intron, the novel ELF3 
gene 5 f UTR, and Alu^. The kits also comprise instructions directing the use of 
the probe for determining whether a nucleic acid sequence homologous to the 

30 probe is present in a nucleic acid preparation. 
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In additional embodiments, the invention is directed to methods for 
determining whether a cell or other sample comprises a virus. The methods 
comprise adding contents of the cell or adding a portion of the sample to a 
culture, where the culture comprises a susceptible cell that is capable of 
acquiring a particular characteristic upon infection with a virus. The particular 
characteristic can be intron retention of ELF- 3 mRNA and/or acquisition of 
[Ahwl in an ELF3 gene. The methods further comprise determining whether 
the susceptible cell has acquired the characteristic after addition of the contents 
of the cell. 

Brief Description of the Drawings 

FIG. 1 shows results from experiments relating to genomic DNA Southern 
blots for probe GC3. Panel A shows a Southern blot using probe GC3 with 5 jig 
of Hpaa and Afcpl digested genomic DNA prepared from K151 breast cancer cell 
cultures (lane T) and normal cell lines from the same effusion (lane N). The 
GC3 probe only hybridized to tumor genomic DNA but not to normal amplicon 
DNA with either HpaU or Mspl digestion. Panel B shows the Hpa n or Msp I 
digested tumor (lane T) and normal (lane N) genomic DNA electrophoresis 
before transfer to the blot membrane for GC3 probe treatment, which served as 
the DNA digestive and quantitative control. 

FIG. 2 shows a gel of electrophoresed PCR products establishing the 
presence of the GC3 202 bp DNA fragment in both breast tumor and normal cell 
lines. DNA isolations from 3 breast tumor cell lines and matched normal cell 
lines were amplified by GC3 primers, designed from the GC3 DNA sequence, in 
PCR reactions. Lane M, lOObp DNA ladder; lane T and N represent tumor and 
normal cell lines respectively, GC3 plasmid served as a positive control. 

FIG. 3 shows a gel of electrophoresed products from a reverse 
transcriptase-polymerase chain reaction (RT-PCR) amplification of GC3 in 
breast tumor cell cultures and matched normal cell cultures. The 202 bp GC3 
was amplified from breast tumor cell lines but not matched normal cell lines, 
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indicating the presence of GC3 in mRNA from the tumor lines but not the 
normal lines. Lane T, breast tumor cell lines; lane N, normal matching line. 
K151 is a myofibroblast cell line; K234 is a CD4+ T lymphocyte line. IL-1 
served as a positive control for RT-PCR (lane p); lane n, negative control; lane 
5 M, 100 bp DNA ladder. 

FIG. 4 shows gels of electrophoresed PCR products of cDNAs from breast 
tumor tissues and matched normal tissues. The gels demonstrate that the 202 
bp GC3 fragment is present in mRNA of breast tumor tissues but not in matched 
normal tissues. Six paired cDNAs from breast tumor and matched normal 

10 tissues were amplified by GC3 primers in PCR reactions. GC3 was expressed in 
four of six breast tumor tissues, but none of the six matched normal tissues 
(Panel A). The presence of intact input RNA was checked in all samples by 
amplification of human p-actin (Panel B). Lane M, 100 bp DNA ladder; lane N 
and T represent normal tissue and breast tumor respectively. The patient ID 

15 numbers are below the N and T lanes. DNA from K151 tumor cells were used 
as a positive control (lane p); double distilled H 2 0 was used as a negative 
control in the PCR reactions. 

FIG. 5 shows gels of electrophoresed PCR products showing that the 202 
bp GC3 product was abolished by RNase digestion of isolated mRNA, but not by 

20 DNase I digestion. Total cellular RNA prepared from K151 tumor cell lines was 
subjected to Dnase I (lane D) and RNase (lane R) digestion before cDNA 
synthesis. RT-PCR was performed using GC3 primers. The 202 bp GC3 product 
was produced on the DNase I-digested RNA isolate but not on RNase-digested 
RNA isolate. The result verified that the 202 bp GC3 is generated by 

25 amplification of mRNA; contamination with genomic DNA is excluded. 

FIG. 6 shows a gel of electrophoresed PCR products evaluating nuclear or 
cytoplasmic presence of GC3 in RNA from breast tumor cells. RNA was isolated 
from nuclear (Nuc) and cytoplasmic (Cyto) fractions. PCR using GC3 primers 
was performed on the RNA isolates with (RT+) or without (RT-) a prior reverse 

30 transcription step. The presence of intact input RNA was checked in all samples 
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by amplification of human p-actin. Lane M, 100 bp DNA ladder. DNA from 
K151 tumor cells was used as a positive control (Pos); ddH 2 0 was used as a 
negative control. The GC3 202 bp product was produced from both nuclear and 
cytoplasmic mRNA from K151 tumor cell lines and nuclear mRNA from MCF7 
5 cell lines; weakly produced on cytoplasmic mRNA from MCF7 cell lines; and 

» 

produced in nuclear mRNA from U937 cell lines only when the mRNA was 
reverse transcribed to cDNA. No GC3 or p-actin products were produced on 
RNA isolates without reverse transcription, ruling out contamination of RNA 
isolates with genomic DNA. 

10 FIG. 7 shows a gel of electrophoresed PCR products evaluating GC3 

expression on cDNA libraries from K151 tumor cell lines for 5' RACE and RT- 
PCR. The 5' RACE cDNA library was synthesized by modified lock-docking 
oligo(dT) primer and SMART H oligo (SMART RACE cDNA Amplification Kit, 
Clontech Inc.); cDNA was synthesized by oligo (dT)16 (RNA PCR Kit, Perkin 

15 Elmer) as well as total cellular RNA and was amplified using GC3 primers. 
GC3 was amplified from both tumor cell lines, irregardless of the method 
employed for cDNA synthesis. More importantly, GC3 was not amplified from 1 
\ig total cellular RNA from K151 tumor cell lines and 3 pg total cellular RNA 
from K259 tumor cell lines, demonstrating no genomic DNA contamination in 

20 the RNA isolations. A GC3 plasmid was used as a positive control for the PCR 
reaction. 

FIG. 8 shows a gel of electrophoresed PCR products evaluating 5' RACE 
and 3' RACE results from K151 and K259 cDNA. cDNAs for 5' RACE and 3' 
RACE were synthesized by using RNA from K151 and K259 breast tumor cell 
25 lines. In the 5' RACE, GC3 UPF (SEQ ID NO:18) and GC3 UPN (SEQ ID NO:19) 
were used as the first and second primers. In 3' RACE, GC3 DF (SEQ ID NO:20) 
and GC3 DN (SEQ ID NO:21) were used as the first and second primers. 

FIG. 9 shows schematic diagrams illustrating different forms of the ELF3 
gene and their relation to cancer. Panel a shows the genomic organization of 
30 the human ELF3 gene. Exons 1 to 9 are represented by filled boxes, and the 
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introns in between are represented by lines. Panel b shows where unspliced 
ELF3 mRNA was found. The entire intron 4, 5 and 6, as determined by 5 ! 
RACE, and the GC3 fragment, as determined by RT-PCR, are indicated. The 
numbers indicate the locations in the genomic sequence. Panel c shows the 
. 5 fully spliced ELF3 mRNA. The exon 1 in the darkened box indicates a different 
5 1 UTR from previously published sequences. 

FIG. 10 shows a gel of electrophoresed PCR products evaluating the 
presence of spliced ELF3 mRNA in breast tumor cell lines in RT-PCR reactions. 
Primers 1-3, 3-6, 6-8 and 8-9 amplified ELF3 exon 1 to 3, 3 to 6, 6 to 8 and 8 to 

10 9 respectively. The length of DNA fragments with and without intron retention 
are labeled. DNA fragments without intron retention were observed in exon 1 
to 3, 3 to 6, 6 to 8 and 8 to 9 on both breast tumor cell lines K151 and K259. 

FIG. 11 shows gels of electrophoresed PCR products evaluating GC3 
presence in genomic walking steps. Panel A. Up-stream walking; Panel B. 

15 Down-stream walking; Panel C. Down-down stream walking. Lane M: lOObp 
DNA ladder. Lane 1. Dral library; Lane 2. Stul library; Lane 3: PvuII library; 
Lane 4: EcoRV library. 

FIG. 12 shows a gel of electrophoresed PCR products evaluating the 
presence of the 315bp Alu^ sequence exemplified herein, in normal and breast 

20 cancer patients. The DNA from breast cancer cell lines (K151T, K234T and 
K259T), normal cell lines from patient with breast tumor (K234 N) and normal 
cells from donors without breast cancer (donor J and donor S) were amplified 
with Aluj^ primers. The 451bp DNA fragment was amplified in all samples. A 
plasmid containing the Alu^ DNA fragment from K151 tumor cells was used as 

25 a positive control. 

FIG. 13 shows gels of electrophoresed PCR products establishing Alu,^ 
retention in mRNA of breast tumor cell lines, but not normal cells. cDNA from 
K151 and K234 breast tumor and matched normal cell lines (lanes K151 and 
K234 T and N, respectively); K259 breast tumor cell line and donor 1 PBMC 

30 (lane K259-T and N, respectively); and MCF-7 breast cancer cell line were 
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amplified by Alu primers (A) and p-actin primers (B). Alu^ was present in 
mRNA from all breast tumor cells and no normal cells- p-actin presence in 
similar amounts in all samples except the negative control indicated RNA 
integrity and equivalent quantity in all of the samples tested. 
5 FIG. 14 shows gels of electrophoresed PCR products evaluating ELF3 

intron 7 retention in mRNA in peripheral blood mononuclear cells (PBMC) from 
breast cancer patients with clinical remission. The mononuclear cells from 
pleural effusion (PE cells) in the late stage of breast cancer patients and PBMC 
in the remission period of breast cancer patients were used for RNA isolation. 

10 Synthesized cDNA was amplified with GC3 primers for intron 7 retention (Panel 
A) and p-actin primers for RNA integrity and quality control (Panel B). Intron 7 
retention occurred in 2 of 3 cell preparations from pleural effusion of late stage 
of breast cancer patients and in 1 of 3 PBMC from early stage of breast cancer 
with clinical remission. 

15 FIG. 15 shows gels of electrophoresed PCR products establishing the 

association of ELF3 mRNA multiple intron retention in PBMC with the human 
breast cancer DCIS. cDNA from 10 breast cancer patients were amplified with 
GC3 primers to test for intron 7 retention (Panel A), Alu^ primers to test for 
intron 8 retention (Panel B), and p-actin primers for RNA quality control (Panel 

20 C). The results showed intron 7 retention occurred in 4 of 5 PBMC from 

patients with breast cancer with DCIS subtype and 0 of 5 PBMC from patients 
with breast cancer with other subtypes. Intron 8 Alu^ retention occurred in 
PBMC from 3 of 5 patients with breast cancer with DCIS subtype and 0 of 5 
patients with other subtypes of breast cancer. K151 5 1 RACE cDNA library 

25 served as a positive control in all assays. 

FIG. 16 shows gels of electrophoresed PCR products establishing that 
Atokw, i s present in retained intron 8 in ELF3 mRNA of breast tumor tissues but 
not matched normal tissues. cDNA from 8 sets of breast tumor and matched 
normal tissues were amplified by Alu^ primers. Alu^ was present in ELF3 

30 mRNA of 5 of 8 breast tumor tissues and 0 of 8 normal tissues (Panel B). 
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Integrity and quantity of RNA was checked in all samples by amplification of 
human p-actin (Panel A). Lane M, 100 bp DNA ladder; lane N and T represent 
normal tissue and breast tumor respectively. The patient ID numbers are below 
the N and T lanes. DNA from K151 tumor cells was used for a positive control 
(lane p); ddH z O was the negative control (lane n). 

FIG. 17 shows gels of electrophoresed PCR products establishing the 
presence of Alu^ expression in cytoplasmic and nuclear RNA in human breast 
cancer cell lines. Nuclear and cytoplasmic RNA was purified from human breast 
cancer cell lines K151, K259 and MCF-7, human cervical carcinoma cell lines 
C33-A, human histiocytotic lymphoma cell lines U-937, and human acute T cell 
leukemia cell line Jurkat. Integrity and quantity of RNA was checked in all 
samples by amplification of human p-actin (Panel A). Alu kwd was present in 
cytoplasmic and nuclear RNA from human breast cancer cell lines K151, K259 
and MCF-7, and in C33-A and U-937 nuclear but not cytoplasmic RNA and was 
absent in Jurkat cytoplasmic and nuclear RNA (Panel B). Negative Alu^ PCR 
results in the same RNA isolation run in the same test demonstrated there were 
no DNA contamination in these RNA isolation (Panel C). 

FIG. 18 shows gels of electrophoresed PCR products demonstrating that 
the Alu^j and p-actin product was abolished by RNase digestion of RNA but not 
by DNase I digestion. Total cellular RNA prepared from the K151 tumor cell 
line was subjected to DNase I (lane D) and RNase (lane R) digestion prior to 
cDNA synthesis. RT-PCR was performed using p-actin primers (Panel A) and 
Alu^,, primers (Panel B). The expected PCR product was produced from the 
DNase I-digested RNA isolate but not from the RNase-digested RNA isolate, 
when both the p-actin and Alu^ primers were used. The result verifies that the 
415 bp Alu^,, product is generated by amplification of mRNA; contamination 
with genomic DNA is excluded. An RNA isolation from the K151 tumor cell line 
without digestion was used as positive control for RT-PCR (pos 1); DNA from 
the K151 tumor cell was used as a positive control for the PCR reaction (pos 2); 
ddH 2 Q was used as a negative control in the RT-PCR reaction (neg). 
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FIG. 19 shows gels of electrophoresed PCR products demonstrating ELF3 
mRNA retention of intron 7 in breast tumor cells. Various concentrations of 
intron 7-expressing cells ( K259 tumor cell lines) were spiked into 2xl0 6 PBMC 
prepared from a normal blood donor. cDNA from those samples were amplified 
5 with GC3 primers for intron 7 expression (Panel A) or p-actin primers for RNA 
integrity and quality control (Panel B) . In the same experiment, the RNA 
isolates were also amplified with p-actin primers to detect DNA contamination 
in those RNA isolates (Panel C). Negative ( neg) and positive (pos) controls 
were ddH 2 0 and RNA from the K151 tumor cell line, respectively. Intron 7 
10 retention was observed at a K259 breast tumor cell concentration from 10 6 to 

10 3 per 2xl0 6 normal cells. Positive p-actin expression in all samples that were 
.< reverse transcribed demonstrated equal amount of RNA input in RT-PCR 

reaction; negative p-actin expression in the RNA isolates that were not reverse 
transcribed ruled out the possibility of DNA contamination. 

15 Detailed Description of the Invention 

The present invention is based on the discovery of novel ELF3 gene and 
ELF3 message (mRNA) sequences. The novel sequences include intron 
retention in the mRNA; a novel Alu sequence in the ELF3 gene and mRNA; and 
a novel 5' untranslated region (UTR) in the ELF3 gene. These novel sequences, 

20 which can be isolated from cancerous tissue biopsies as well as peripheral blood 
mononuclear cells (PBMCs), are associated with the presence of cancer in a 
patient having the novel sequences. In particular, the sequences are associated 
with breast cancer, especially ductal carcinoma in situ (DCIS). 

Based on the association between the sequences and cancer, methods 

25 which detect the presence of any of the sequences in a patient is useful in the 
diagnosis of cancer. 

While the strongest association of the presence of these sequences is with 
DCIS, the sequences have also been associated with other cancers, in particular 
other forms of breast cancer, and methods for detecting other forms of cancer 
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using these sequences are also useful. Nevertheless, the very strong association 
with DCIS allows one to distinguish DCIS from other forms of breast cancer 
using these sequences with a high probability. 

Thus, in some embodiments, the invention is directed to cDNAs of a 
5 mammalian ELF3 gene, or fragments thereof at least 20 nucleotides long, 

which comprise an intron of the ELF3 gene or a portion of an intron of the ELF3 
gene. Fragments of the cDNA are preferably longer than 20 nucleotides long, 
for example at least 50, at least 100, at least 500, or at least 1000 nucleotides 
long. 

10 As used herein, a cDNA has its common meaning, that is a DNA 

comprising the sequence of a reverse-transcribed polyA-containing mRNA. This 
includes amplified products of the reverse-transcribed mRNA, such as products 
from an RT-PCR procedure. Since a cDNA is a reflection of the mRNA that is 
present, an ELF3 cDNA that retains an 'intron of the ELF3 gene indicates that the 

15 mRNA has inappropriately retained an ELF3 gene intron, which is associated 
with cancer, particularly DCIS (See Example 1) . An example of a normally 
spliced ELF3 cDNA (without an intron or portion) is provided as SEQ ID NO:2. 

In preferred embodiments, the ELF3 cDNA comprises intron 4, intron 5, 
intron 6, intron 7, intron 8, portions of any of those introns, or combinations of 

20 any of those introns or portions. Introns 4, 5, 6, 7 and 8 of the ELF3 gene can 
be readily identified by the skilled artisan by consulting public databases such as 
GenBank, where a human ELF3 gene is provided as Accession AF1 10184 (SEQ 
ID NO:l). An amino acid sequence (SEQ ID NO:3), the translation of SEQ ID 
NO:l (after mRNA processing), is also provided under Accession AF110184. 

25 See Appendix, identifying SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID 
NO:8 and SEQ ID NO:9 as introns 4, 5, 6, 7 and 8, respectively. 

An example of a retained sequence that is associated with cancer is SEQ 
ID NO: 11 (Example 1 - also identified therein as GC3), which is present in the 
mRNA (and derived cDNA) of cancer patients as retained portions of introns 7 

30 and 8. 
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These embodiments are not limited to any specific ELF3 cDNA or intron 
sequences such as SEQ ID NO:2, 5, 6, 7, 8, or 9. Rather, homologous sequences 
from any mammal, or alternative human sequences are also envisioned as 
within the scope of the invention. The skilled artisan would understand that 
5 there could be small variations among ELF3 gene, cDNA, or amino acid 

sequences between mammals, or among humans. For example, SEQ ID NO:3 
and SEQ ID NO:4 provide alternative amino acid sequences resulting from the 
translated gene provided as SEQ ID NO:l (starting at nt 5319) and the cDNA 
SEQ ID NO:2, respectively (see Appendix). 

10 In some aspects of these embodiments, the cDNA or portion also 

comprises an Alu^. Alu^ is a novel Alu sequence that is present inserted into 
ELF3 introns in cancerous tissue as well as PBMCs of cancer patients (see 
Example 2). In particular, Alu^ is found in breast cancer, especially DCIS. 

One example of Alu^ consists of the sequence provided herein as SEQ 

15 ID NO: 13. However, based on the understanding that Alu sequences have many 
variants, such that they can be logically divided into families that are at least 
about 90%, more preferably 95%, homologous to each other (Roy-Engel et al., 
2001), it would be expected that Alu kwd exists as several different sequences 
that are at least about 90% homologous to each other. It would also be 

20 expected that any one of those forms of Alu^ would be associated with cancer. 
In preferred embodiments, the Alu^ is found in cDNAs of cancer 
patients within a retained intron 8. In more preferred embodiments, the Alu^ 
is between nucleotides 8762 and 8763 using the numbering of SEQ ID NO:l. 
In some embodiments, the cDNA of the present invention comprises the 

25 entire ELF3 gene coding region, i.e., from the 5' UTR to the polyA tail. In other 
embodiments, the cDNA consists of only a fragment of the frill length coding 
region, comprising at least 20 nucleotides of the coding region. The latter 
fragment could be obtained through reverse transcription polymerase chain 
reaction (RT-PCR) of cellular mRNA or total RNA, using PCR primers that do 

30 not amplify the entire coding region. Such methods are well known. 
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In some preferred embodiments, the cDNA comprises introns 4, 5, 6 and 
7 of the ELF3 gene, for example those provided as SEQ ID NO:5, 6, 7 and 8, 
respectively. In other embodiments, the cDNA comprises the 5* UTR of the 
ELF3 mRNA. Preferably, the 5' UTR comprises the nucleotide sequence 
5 provided herein as SEQ ID NO: 15, or a variant of SEQ ID NO: 15 that is at least 
about 90% homologous to SEQ ID NO:15.. 

A preferred example of a full length cDNA comprising SEQ ED NO: 15 is 
SEQ ID NO:2, where the cDNAis interspersed by one or more introns. 

In preferred embodiments, the cDNA of the present invention is prepared 

10 from a composition comprising a cell, for example a tissue sample from a 
patient or from PBMCs. In some of these embodiments, the cell further 
comprises genomic DNA comprising an Al% wd , for example consisting of SEQ ID 
NO:13. Preferably, the Alu^ is between nucleotides 8762 and 8763 of an ELF3 
gene in the cell, using the numbering of SEQ ID NO:l. 

15 In other preferred embodiments, the cDNA is prepared from a 

composition comprising a cell, where the cell obtained from a patient being 
tested for breast cancer. Preferably, the patient is at high risk for breast cancer. 
In these embodiments, the cell composition is preferably a PBMC composition 
or a biopsy of tissue (preferably breast tissue) or an effusion suspected of being 

20 cancerous. 

The preparation of the cDNA can be using any method known in the art. 
In preferred embodiments, the cDNA is prepared using RT-PCR. Those RT-PCR 
methods would utilize primers suitable for amplifying at least a portion of an 
ELF3 gene sequence suspected of being associated with cancer, such as ELF3 
25 intron 4, 5, 6, 7 or 8, an Alu^, or the novel ELF3 5* UTR identified herein. See 
Examples. 

Included herein as an RT-PCR technique is the nucleic acid sequence- 
based amplification ("NASBA") method, as described, for example, in U.S. 
Patent No. 6,326,173, and references cited therein. 
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Piimers (i.e., a set of two primers) are suitable for amplifying a region of 
an ELF3 gene when the primers flank the region and allow amplification of that 
region using PCR. Sequence-specific primers related to a mammalian ELF3 
gene, ELF3 mRNA or corresponding cDNA, or to an intron of the ELF3 gene are 
5 also useful in methods of detecting target ELF3 sequences by sequencing 
reactions, as an alternative to PCR-based methods. 

The present invention is also directed to vectors comprising any of the 
above-described cDNAs. As used herein, a vector takes its common molecular 
biology meaning, that is a piece of nucleic acid capable of replication in a host 

10 cell. Preferred examples include plasmid vectors and viral vectors. Such vectors 
are useful for preserving and increasing the amount of a cDNA in a cell. 

In related embodiments, the invention is also directed to cells transfected 
with any of the above vectors, such that the vector is capable of replication in 
the cell. Any cell supporting replication of the vector, including prokaryotic and 

15 eukaryotic cells, is envisioned as within the scope of these embodiments. Also 
included are cells where the vector sequence comprising the cDNA is integrated 
into a chromosome of the cell, or where the vector autonomously replicates in 
the cell, independent of chromosomal replication. 

In other embodiments, the invention is directed to various isolated 

20 nucleic acid or mimetic sequences. Each of the sequences is useful for, e.g., 
determining whether the sequence is present in a sample, for example a PBMC 
preparation or a biopsy. The sequences are preferably greater than 10 or 20 
nucleotides long and less than 50 kB. More preferably, the sequences are less 
than 12 kB. An example of a useful sequence less than 12 kB is a full length 

25 sequence of the ELF3 gene from a patient being diagnosed for cancer, e.g., 

DCIS. The sequence could be analyzed for the novel 5' UTR or the novel Alu^ 
both identified in the experiments discussed in the Examples. In other aspects 
the sequences are less than 2 kB, or 1 kB, or 500 nt, e.g., to be able to more 
usefully clone the novel 5 1 UTR or the novel Alu^, perhaps with flanking 

30 sequences, into a vector to clone into a cell such as an E. coli or a mammalian 

204024.1 



-25- 

cell/ Optionally, the sequences can incorporate a detectable label, to identify 
the novel 5' UTR, the novel Alu^,,, or any intron retained in an ELF3 sequence 
by hybridization. 

These sequences can be comprised of DNA, RNA or a mimetic. As used 
5 herein, a mimetic is a nucleotide analog that differs chemically from a naturally 
occurring nucleotide, but that is capable of oligonucleotide-like noncovalent 
binding to a homologous nucleotide sequence. See, e.g., U.S. Patent 6,436,909 
for a discussion of useful mimetics. A preferred example of a useful mimetic is a 
phosphorothioate mimetic, which are well known. 

10 In some embodiments the nucleic acids or mimetics comprise a sequence 

homologous to at least a portion of an intron of a human ELF3 gene, and may 
optionally incorporate a detectable label. These sequences are useful e.g., for 
detennining if ELF3 mRNA from the sample has retained at least a portion of an 
intron. In preferred embodiments, the intron to which the nucleic acids or 

15 mimetics are homologous is intron 4, intron 5, intron 6, intron 7 or intron 8, 
exemplified herein as SEQ ID NO:5, 6, 7, 8 or 9, respectively. 

In other embodiments, the nucleic acids or mimetics comprise a sequence 
at least 95% homologous to at least a portion of SEQ ID NO: 13, useful, e.g., for 
determining whether a member of the Alu kwd family is present in either DNA or 

20 mRNA from the sample. Preferably, the sequence is completely homologous to 
SEQ ID NO:13. As with previous embodiments, theses sequences can optionally 
comprise a detectable label. The sequence can also comprise regions, of the 
ELF3 gene where the Alu,^ is expected, for example the regions on either side 
of nucleotides 8762 and 8763 of the ELF3 gene, regions where Alu,^ inserts 

25 (see Example 2). 

As used herein, a first sequence is at least 95% homologous to a second 
sequence when the first sequence is 95% identical to the second sequence or the 
complement of the second sequence. Where no percentage of homology is 
used, "homologous" means completely homologous. A sequence, e.g., a primer, 
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is homologous to a longer sequence, e.g., an ELF3 gene, when the sequence has : 
complete identity to a portion of the longer sequence, or its complement. 

In still other embodiments, the nucleotide or mimetic sequence is at least 
95% homologous to at least a portion of SEQ ID NO: 15, indicating that the 
novel 5' UTR is present in either DNA or mRNA from the sample. Also useful 
are sequences encoding an ELF3 open reading frame such as SEQ ID NO:3 or 
SEQ ID NO:4 or their complement, adjoining the 3* end of SEQ ID NO: 15. 

Also included within the scope of the invention, are vectors comprising 
any of the nucleic acids described above. Cells transfected with these vectors 
are also envisioned. These include either prokaryotic and eukaryotic cells, 
including cells within multicellular organisms that have been transfected with 
the vectors to determine the effect of the presence of the nucleic acid on the 
organism. 

In related embodiments, the invention is directed to probes which 
comprise any of the nucleic acid or mimetic sequences described above, further 
comprising a detectable label. The labels are useful, for example, for detecting 
and monitoring the presence of the nucleic acid sequence in an organism or a 
sample. Many detectable labels are known; the invention is not narrowly 
limited to any particular type of label. The type of label can be chosen as most 
appropriate for the particular use being employed. Examples include 
radioactive, fluorescent, spin, or hapten labels. The latter are labels that are 
detected using antibodies that specifically bind to the hapten. A well-known 
.example is digoxigenin. 

The sequences described herein as being associated with cancer could 
also be identified using sets of two primers that are suitable to amplify (e.g., 
using PCR or RT-PCR) and detect those sequences. Thus, the invention is also 
directed to sets of two primers, wherein each primer is homologous to a portion 
of the ELF3 gene. Preferably, the primers are less than about 50 nucleotides in 
length, more preferably less than about 40 nucleotides in length, and most 
preferably less than about 30 nucleotides in length. 
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In some aspects, at least one primer is homologous to a portion of an 
intron of the ELF3 gene. In these aspects, when the primers are used in a 
procedure such as RT-PCR, the primers amplify a defined mRNA sequence only 
if an intron was present in the sequence. 
5 In other aspects, primers that are homologous only to exon sequences are 

useful if each of the two primers are homologous to different exons. In that 
situation, the product of amplification would be one size if intron retention was 
not present in the amplification product, and a larger size if an mRNA, or a 
portion thereof, that does retain an intron is amplified. 

10 As used herein, a primer is defined as homologous to another nucleotide 

sequence if that primer is homologous to either strand of the duplex of that 
sequence, provided the primer is useful when used with another primer in 
amplification methods. Introns 4, 5, 6, 7, and especially 8. are preferred as 
targeted by these primers. To determine if Alu^ is present between 

15 nucleotides 8762 and 8763 of the ELF3 gene; one of the primers would be 

homologous to a region of an ELF3 gene 5 f to nt8762 of the ELF3 gene, and the 
other of the two primers is homologous to a region of the ELF3 gene 3 1 to 
nucleotide 8763 of the ELF3 gene. 

Other primer sets envisioned herein include sets suitable for amplifying 

20 an Alu^. Examples of such primer sets are those where one or both primer is 
at least 95% homologous to SEQ ID NO: 13, including those where one or both 
primers are completely homologous to a portion of SEQ ID NO: 13. In the 
embodiments where only one primer is homologous to SEQ ID NO: 13, the other 
primer is preferably homologous to a portion of an ELF3 gene, such as an intron 

25 of an ELF3 gene, for example intron 8, identified in Example 2 to harbor an 
Ahw 

Additional primer sets envisioned as within the scope of the invention 
are sets suitable for amplifying an ELF3 5' UTR that is at least 95% homologous 
to SEQ ID NO: 15. Preferably, at least one primer is homologous to SEQ ID 
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NO:13 and the other primer is homologous to an ELF3 gene, for example the 3' 
end of the open reading frame of an ELF3 gene. 

Since it is expected that cancers in any mammal would be associated 
with the presence of any of the above ELF3 sequences, e.g., mRNAs retaining 
5 introns or portions of the ELF3 gene, the novel 5' UTR, Alu^, and the novel 
5"UTR, the invention encompasses these sequences from any mammalian 
species, although in preferred embodiments, the mammal is a human. 

Any ELF3 nucleotide sequence, including gene, cDNA, mRNA, primer, 
and probe sequences, and ELF3 amino acid sequences from any mammal can be 

10 readily identified by the skilled artisan as being at least about 80% homologous 
to the analogous sequences provided herein. More preferably, the variants are 
at least about 90% homologous; even more preferably about 95% or 99% 
homologous; and most preferably completely homologous to the sequences 
provided herein. All human ELF3 gene, cDNA and amino acid sequences would 

15 be expected to be at least about 95% homologous to the analogous sequences 
provided herein. The sequence of any mammalian ELF3 gene, cDNA, or amino 
acid sequence could be obtained without undue experimentation by well known 
methods. 

Also envisioned as within the scope of the invention are pairs of cell 
20 cultures, where both cell cultures are of the same tissue type and are derived 
from cancerous mammalian tissue, and where one of the cell lines is of 
cancerous cells and the other cell line is of matched noncancerous cells. 
Examples include pairs of cell cultures prepared as described in Example 1, for 
example the pair designated K259. 
25 The invention is also directed to methods for determining whether a 

patient has cancer or is at risk for cancer. The methods comprise evaluating 
whether a cell in the patient comprises any of the ELF3 nucleic acid sequences 
established herein to be associated with cancer. The sequences include those 
indicating intron retention in an ELF3 mRNA, the novel 5' UTR (exemplified as 
30 SEQ ID NO: 15) and an Alu^ (exemplified herein as SEQ ID NO:13). The 
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methods generally utilize any of the novel primers, probes, or nucleic add 
sequences described above. These methods are preferably done with a sample 
of many cells, for example a PBMC preparation or a tissue biopsy from the 
patient such as from a breast lesion or lymph node with metastatic cancer or a 
5 cancerous effusion. As used herein, a biopsy is the removal of tissue from a 
patient, including the removal of fluid from effusions, for example breast cancer 
pleural effusions. The cells in the sample can be of one or more than one cell 
type. 

In some embodiments, these methods utilize primers in a polymerase 

10 chain reaction (PCR) to amplify DNA to establish the presence or absence of the 
tested ELF3 sequence. Reverse transcription of mRNA is also useful in some 
embodiments to prepare cDNA for PCR, e.g., when determining whether mRNA 
intron retention is present. See discussion of RT-PCR in the Examples. PCR 
could also be used without reverse transcriptase, for example when determining 

15 whether the novel 5* UTR is present in the genome of the cell. 

In other embodiments, these methods utilize one of the probes described 
above in northern hybridization. As is well known, northern hybridization 
generally involves isolation of mRNA from the cell, electrophoresis of the mRNA 
on a gel, blotting of the gel to transfer the mRNA to a membrane, and treating 

20 the membrane with a probe, to determine whether a sequence homologous to 
the probe is present on the gel and thus in the mRNA of the cell. 

Other embodiments of these methods utilize one of the above-described 
probes in Southern hybridization. As is well known, Southern hybridization 
generally involves isolation of DNA from the cell, electrophoresis of the DNA on 

25 a gel, blotting of the gel to transfer the DNA to a membrane, and treating the 
membrane with a probe, to determine whether a sequence homologous to the 
probe is present on the gel and thus in the DNA of the cell. 

The invention is also directed to kits for evaluating whether a patient has 
cancer or is at risk for cancer. The kits of these embodiments comprise at least 

30 one set of two primers that are homologous to a portion of an ELF3 gene, 
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wherein the primers are useful for amplifying a nucleic acid sequence 
established herein to be associated with cancer. As previously discussed, the 
nucleic acids established herein to be associated with cancer include intron 
retention in an ELF3 mRNA, the novel ELF3 5' UTRs identified herein 
5 (exemplified by SEQ ID NO:15), and an Alu^ (exemplified herein by SEQ ID 
NO:13). 

These kits also comprise instructions directing the use of the primers for 
determining whether the nucleic add sequence is present in a nucleic add 
preparation such as an mRNA, cDNA or genomic preparation, as appropriate. 

10 These instructions, need not be physically assodated with the primers, but could 
refer to the use of the primers from a source physically separated from the 
primers, e.&., from a web site or a separately mailed paper. 

As discussed above in the context of the primers of the invention, when 
the primers are directed to determining whether there is intron retention in an 

15 ELF3 mRNA, at least one primer is homologous to a portion of an intron of the 
ELF3 gene, or the two primers are homologous to portions of the ELF3 gene 
that flank an intron of the ELF3 gene. 

In related embodiments, the invention is also directed to other kits for 
evaluating whether a patient has cancer or is at risk for cancer. These kits 

20 comprise a nudeic add sequence and/or probe, as discussed above, which is 
useful for determining whether a sample has one of the ELF3 gene sequences 
identified herein as being assodated with cancer. These kits also comprise 
instructions directing the use of the nudeic add sequence or probe for 
determining whether a nudeic add sequence homologous to the probe is 

25 present in the sample. 

In some embodiments, these kits comprise a gene chip having numerous 
probes or nudeic add sequences, for example probes or sequences for each of 
the retained ELF3 introns and/or ALU^. Probes or sequences diagnostic for 
other diseases, e.g., a BRCA I probe, could also be induded. Gene chip 

30 technology is well known in the art. 
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In further embodiments, the presence in a sample of one of the ELF3 
gene sequences identified herein as being associated with cancer is detected by 
sequencing RNA, cDNA or DNA of the sample, wherein the sequencing may be 
accomplished by any of the various sequencing methods known in the art. 
5 The inventors have also discovered that addition of a virus, e.g., Epstein- 

Barr Virus (EBV), to a cell in culture, for example a BJAB cell, causes ELF3 
mRNA intron retention and/or ELF3 Alu^ appearance. See Example 3. Based 
on this finding, a cell suspected of harboring a virus that causes ELF3 mRNA 
intron retention can be easily assayed for presence of a virus. 

10 Thus, the invention is also directed to methods for detenniiiing whether 

a cell comprises a virus. The methods comprise a first step of adding the 
contents of the cell to a culture, where the culture comprises a susceptible cell 
that is capable of acquiring a characteristic upon infection with a virus. As 
disclosed herein, the characteristic is ELF3 mRNA intron retention and/or 

15 acquisition of an Alu kwd , for example SEQ ID NO: 13, in an ELF3 gene. The 
methods further comprise a second step of determining whether a susceptible 
cell has acquired either or both of the above characteristics after addition of the 
contents of the cell. An example of a susceptible cell is a BJAB cell, which is an 
EBV-negative Buririttfs lymphoma. In preferred embodiments, the virus is 

20 related to Epstein-Barr virus, preferably a member of the Herpesviridae, more 
preferably a member of the Gammaherpesviradae, and most preferably a 
Lymphocrypto virus . 

Preferred embodiments of the invention are described in the following 
examples. Other embodiments within the scope of the claims herein will be 

25 apparent to one skilled in the art from consideration of the specification or 
practice of the invention as disclosed herein. It is intended that the 
specification, together with the examples, be considered exemplary only, with 
the scope and spirit of the invention being indicated by the claims which follow 
the examples. 
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Example 1. Unspliced Elf3 Cytoplasmic mRNA in Human Breast Cancer Cells 
Example Summary 

Using modified representational difference analysis (mRDA), a DNA 
fragment (denoted GC3) was isolated as a difference between a human breast 
5 cancer cell line K15 1 (tester) and a normal cell line (driver) from the same 
patient. GC3 proved to be a fragment of intron 7 of the ELF3 gene which 
appears to be amplified in the K151 breast cancer cell line. The ELF3 gene 
belongs to the ETS family of transcription factors which are frequently altered in 
several types of cancer. This intron fragment of the ELF3 gene was expressed in 

10 human breast cancer cell lines and 4 of 6 breast cancer tissues but not in 
matched normal cell lines and normal tissues after testing by reverse 
transcriptase PCR (RT-PCR). Genomic DNA contamination of RNA isolates was 
excluded by DNAse I and RNAse digestion analysis. mRNA of GC3 was detected 
in both nuclear and cytoplasmic RNA fractions of breast cancer cell lines, 

15 indicating that intron containing ELF3 mRNA had not been properly spliced 
prior to export to the cytoplasm of these cancer cells. These findings were 
verified using the 5' and 3 ! rapid amplification of cDNA ends (5 1 RACE and 3' 
RACE) procedures to search for cDNA sequences in RNA from these cancer cell 
lines. This revealed the presence of partially unspliced ELF3 mRNA and fully 

20 spliced ELF3 mRNA in the same breast cancer cell line. Sequence analysis 
confirmed that GC3 was indeed retained in cytoplasmic mRNA. Partially 
unspliced ELF3 mRNA contained introns 4, 5, 6 and 7 without any nucleotide 
mutation at intron/exon splice junction borders. Fully spliced 1959 bp ELF3 
mRNA showed a different 5 1 UTR from the published ELF3 mRNA, and was 

25 predicted to encode a 371 amino add protein which shared 98% homology to 
the ELF3 protein sequence. This is the first report of intron retention of ELF3 
and/or the pathological appearance of both spliced and unspliced cytoplasmic 
ELF3 mRNA present simultaneously in human breast cancer cells. The finding 
that intron 7 of the ELF3 gene is present in breast cancer cells lines and tissues 

30 (4 of 6 tested) from breast cancer and not in normal autologous breast tissue 
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and cell lines may be very important in the understanding of the pathogenesis 

of breast cancer. 

Introduction 

The search for a viral etiology of human breast cancer has been the 
5 subject of numerous investigations, especially since the discovery of a 
transmissible agent in milk causing breast cancer in mice (Bittner, 1942). 
Representational difference analysis (RDA) is a recently developed technique 
(Lisitsyn et al, 1993; Hubank and Schatx, 1994) that has been useful in 
detecting viral sequences and unique genes. It was instrumental in the 

10 discovery of herpes virus 8 (Chang et al., 1994), hepatitis virus TTV (Nishizawa 
et al., 1997) and the novel gene TSP50 (Yuan et al., 1999). Using a modified 
RDA (mRDA) technique, this study describes the isolation of a DNA intronic 
fragment of the ELF3 gene in breast cancer cells which appears to be uniquely 
retained in the cytoplasmic mRNA in breast cancer cells and cell lines. 

15 Breast cancer cell lines and matched normal cell lines were developed 

from malignant effusions. DNA from a cancer cell line was used as "tester" and 
matched normal cell line DNA was used as "driver" in an mRDA method. Two 
DNA fragments, denoted GC2 and GC3 unique to the cancer DNA, were found. 
This report focuses on GC3, a 531 bp DNA fragment. This fragment was found 

20 to be within intron 7 (bp7514-8045) of the ELF3 gene (Chang et al., 1997; 
Oettgen et al., 1997b; Tymms et al., 1997; Andreoli et al., 1997; Choi et al., 
1998). 

In this study, GC3 appeared as a difference between breast cancer and 
matched nonnal cells, and is present in the amplicon and genomic DNA 

25 Southern blotting of the cancer lines but not the matched controls. In order to 
determine whether there was transcription of this GC3 intron 7 area, 
cytoplasmic mRNA was analyzed by reverse transcription polymerase chain 
reaction (RT-PCR). Using RT-PCR, cDNA was found to be retaining intron 7. 
This observation was confirmed by application of the 5* and 3' RACE procedure 

30 which revealed an ELF3 cDNA sequence including introns 4, 5 and 6 without 
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nucleotide mutation at the intron/exon junctions . In addition to the partially 
unspliced cDNA, a fully spliced 1959 bp ELF3 cDNA sequence was isolated 
which was identical to the mRNA of ELF3, and predicted to encode a 371 amino 
acid protein sharing 98% homology to the ELF3 protein. Although the coding 
5 sequence was almost identical to the published ELF3 gene, the 5* UTR was 

different, and extended from 4976 to 5006 instead of from 4777 to 4888 of the 
ELF3 nucleotide sequence (Tymms et al., 1997). 

Intron retention of the GC3 intronic area was found in the cytoplasm of 
breast cancer cell lines and in breast cancer tissue and appears as a pathological 

10 defect which may be unique to breast cancer. Unspliced ELF3 mRNA in breast 
cancer suggests altered regulatory pathways in the splicing of ELF3 mRNA. In 
eukaiyotic cells, most cytoplasmic mRNA does not contain unspliced sequences 
as unspliced nuclear mRNA is enzymatically destroyed in the nucleus after 
splicing (Darnell et aL, 1997; Cramer et al., 2001; Hide et al, 2001; Stutz et aL, 

15 1998; Krug, 1993; Hastings and Krainer, 2001). However, retroviruses (Cullen, 
1998; Flint et al., 2000; Favaro and Arrigo, 1997) and some heipes viruses 
(Cheung et al., 2000; Ellison et al., 2000; Kienzle et al., 1999) are able to 
induce intron retention in mRNA which enables them to use this mechanism to 
produce different viral proteins (Cullen, 1998; Flint et al., 2000; Favaro and 

20 Arrigo, 1997) and allows them to alter the splicing of cellular proteins 

important to the function of the virus (Cheung et al., 2000; Ellison et al., 2000). 
The finding of intron retention in the ELF3 gene in breast cancer cells may be an 
important finding in understanding the pathogenesis of breast cancer and 
suggests a mechanism to search for a viral cause of breast cancer. 

25 Materials and Methods 

Cell Lines. Paired human breast cancer and normal cell lines were 
established from malignant breast cancer effusions. All effusions were obtained 
from patients with metastatic breast cancer using an investigational review 
board approved protocol. Briefly, mononuclear cells from effusions were 

30 isolated and cultured in RPMI media (GIBCO-BRL) with 20% fetal bovine serum 
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(FBS)-at 37°C in a 5% C0 2 atmosphere. After 2 to 3 days, non-adherent cells 
were transferred to another flask and cultured separately. Cells were monitored 
regularly for morphology and growth characteristics. The adherent cells were 
passed by trypsinization and diluted 1 :2 when adequate growth appeared. Non- 
5 adherent cells were also passed at the same dilution. When independent and 
continuous growth sustained recurrent passage, cytogenetic analysis was 
performed in the Cell Genetics Laboratory of North Shore University Hospital 
using standard cytogenetic techniques, which measure chromosome number and 
morphology. Expression of epithelial glycoprotein (EGP2), a cell surface 

10 glycoprotein present in most epithelial cells and tumors, and cytokeratin-19 
(K19), a primitive keratin expressed by all epithelial cells, was assessed using 
RT-PCR as described (Gazdar «t al., 1998). MCF-7 human breast tumor cell 
lines, U-937 human histiocytic lymphoma cell lines and Jurkat human T cell 
leukemia cell lines were routinely cultured with RPJVH 1640 (GIBCO-BRL) 

15 supplemented with 10% FBS at 37° C in a 5% C0 2 atmosphere. The MCF-7 
human breast cancer cell line, human histiocytic lymphoma cell line (U-937) 
and the human T cell leukemia cell line (Jurkat) were obtained from the 
American Type and Tissue Culture Collection (ATCC). 

Modified RDA. mRDA was performed as described (Yuan et al., 1999). 

20 In brief, two pig of DNA isolated from a breast cancer cell line (K151, tester) and 
its matched normal cell line (driver) by the QIAamp DNA blood kit (Qiagen 
Inc.) were cleaved with the restriction enzyme Hpall (10 U/\il; Boehringer 
Mannheim) in a 50 \il reaction at 37° C overnight. Preparation of tester and 
driver master amplicons and subtractive hybridization were performed as 

25 described (Lkitsyn et al., 1993; Hubank and Schatz, 1994). After a second 
round of subtractive hybridization/PCR amplification, the difference products 
were subjected to a 2% agarose gel electrophoresis and purified by a DNA gel 
extraction kit (Qiagen, Inc). The purified DNA fragments were cloned in the 
pPCR-script Amp SK(+) cloning vector by using a PCR-Script Amp Cloning Kit 

30 (Stratagene). The inserts from positive clones were amplified and used as 
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probes in master amplicon Southern blot. The candidate probes were then 
further tested by human genomic DNA southern blot. 

Amplicon And Genomic DNA Southern Blotting. 6 jig of tester amplicon 
DNA (K151 cancer cell lines) and driver amplicon DNA (K151 normal cell lines) 
5 on 1.5% agarose gel were transferred to a positively charged nylon membrane 
(Boehringer Mannheim) and immobilized by exposure to UV light. The 
plasmids containing interesting inserts from RD A were used as templates with 
T3 and T7 primers for probe labeling using the PCR DIG probe synthesis kit 
(Boehringer Mannheim). Southern blotting and detection was carried out with 
10 the non-radiation Southern Blot detection kit (Genius, Boehringer Mannheim) 
according to the instruction of the manufacturer. For genomic DNA Southern 
blot, 5 p,g of genomic DNA from the K151 cancer cell line and normal cell line 
were digested with Hpall or Mspl overnight and then hybridized with the probe 
by using the same procedure as amplicon Southern blotting. 
15 5* And 3! Rapid Amplification Of cDNA Ends f5 ( RACE And 3 1 RACE). A 

search for cDNA sequences was performed by using the SMART RACE cDNA 
amplification kit (Clontech Inc.). In brief, total cellular RNA was isolated from 
K151 and K259 cancer cell lines by using the high pure RNA isolation kit 
(Roche). Five hundred ng RNA was used for construction of the first-strand 
20 cDNA library. For the 5' RACE, the cDNA was synthesized using a modified 
lock-docking oligo (dT) primer and SMART II oligo primer provided in the kit. 
For the 3 1 RACE, cDNA was constructed using a traditional reverse transcription 
procedure, but with a special oligo (dT) primer provided by manufacturer. The 
protocol followed the instructions from the manufacturer. The primers used in 
25 the SMART RACE procedure are listed in Table 1. The cDNA fragments derived 
from 5' and 3' RACE were gel purified and sequenced by cloning and sequencing 
protocol as described. 
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Table 1: Primers used in Examples 


Primer name" 


Sequence - 5*->-3 (SEO ID NO:) 


Position b 


Tm c 


GC3-S 


CCTGTCCACTGACTCCAGTG (SEQ ID 


7722- 


57 




NO: 16) 


7741 




GC3-AS 


ACTTGGCCACAGCATGCAG (SEQ ID 


7923- 


57 




NO:17) 


7905 




GC3 UPF - AS 


ACCAAAGGCCATGCGGAGGCCAGAGAA 


7572- 


67 




(SEQ ID NO:18) 


7598 




GC3 UPN - 


CAACAACCCGCAGTGCCCCAGGAAGCCC 


7523- 


67 


AS 


(SEQ ID NO:19) 


7551 




GC3 DF - S 


GCAGGGCTGGCTGGCCTTGGGTGAGAGG 


7943- 


67 




(SEQ ID NO:20) 


7970 




GC3 DN - S 


CTTGCAGCGCCCAGAGGCACCCACCTG 


8004- 


67 




(SEQ ID NO:21) 


8030 




GC3 (1-3) - S 


GCTACCTGGCGGAACTGGATTTCTC (SEQ 


4819- 


61 




ID NO:22) 


4843 




GC3 (1-3) - 


CGCTTGCGTCGTACTTGTTCTTCTC (SEQ 


6240- 


61 


AS 


ID NO:23) 


6216 




GC3 (3-6) - S 


AAGAC GCAGGTTCTGGACTGGATC AG (SEQ 


6180- 


63 




ID NO:24) 


6205 




GC3 (3-6) - 


TGGGATCCAGGTCCACGTCACTTC (SEQ ID 


7194- 


63 


AS 


NO:25) 


7171 




GC3 (6-8) - S 


TCCTCAGACTCCGGTGGAAGTGACG (SEQ 


7155- 


63 




ID NO:26) 


7179 




GC3 (6-8) - 


CCGGCTCAGCTTCTCGTAGGTCATG (SEQ 


8198- 


63 


AS 


ID NO:27) 


8174 




GC3 (8-9) - S 


AGCTCAACGAGGGCCTCATGAAGTG (SEQ 


8065- 


61 




ID NO:28) 


8089 




GC3 (8-9) - 


TCCCAGGACGATGGCTGACAATACAC (SEQ 


9352- 


61 


AS 


ID NO:29) 


9327 




ES31 


CCCCAGCCATGTACGTTGCTATCG (SEQ ID 


(P-actin) 






NO:30) 






ES33 


GCCTCAGGGCAGCGGAACCGCTCA (SEQ ID 


(p-actin) 






NO: 31) 






GC3DD - S 


CCTGTGTCCAGGAGTACACTAGATCATC 


8569- 


67 




(SEQ ID NO:32) 


8596 




INSE-S 


AGAGGCAAGGGTCTCTACGTTG (SEQ ID 


8659- 


62 




NO:33) 


8680 




INSE- AS 


TCCCTGGCCTTAAAAGTCATGT (SEQ ID 


8774- 


62 




NO:34) 


8795 





a S - sense primer; AS - antisense primer 

Nucleotide positions are numbered with reference to ELF3 genomic sequence 
AF1 10184 (SEQ ID NO:l) 



30 co C 
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RNA Purification. All RNA isolations were extracted from 1-5x1 0 6 
exponentially growing cells by using the High pure RNA isolation kit (Roche, 
Indianapolis, IN) according to the manufacturer's protocol. RNA in the cell 
lysate was selectively bound to a glass fiber fleece in a microcentrifuge filter 
5 tube during DNase I treatment and DNA removal. The bound RNA was 
purified by washing steps and eluted in 75 \il nuclease-free water. All RNA 
isolates were tested for genomic DNA contamination by PCR amplification 
before synthesis of cDNA. For RNase and Dnase I digestion analysis, ~ 2 jig 
total cellular RNA isolated from the'K151 breast cancer cell line was digested 

10 with either 1 ^g of RNase (Roche, Indianapolis, IN) in a total of 200 pi ddH 2 0 
or 200 U of RNase-free DNase I (Roche, Indianapolis, IN) in 200 \il DNase 
dilution buffer at 37° C for 20 min. RNase or DNase I was then inactivated by 
incubation at 70° C for 10 min. The RNA in this mixture was then isolated using 
the same RNA isolation procedure as described. The RNA was eluted in 15 pi 

15 ddH 2 0. The RNA was quantified by measuring the absorbance at 260 and 280 
11111 (A26o/28o) and its integrity was verified on a formamide-agarose gel. 

Separation Of Nuclear And Cytoplasmic RNA. RNA was extracted from 
the nuclear and cytoplasmic fraction of various cell lines. Cells (~ 5xl0 6 ) were 
washed with ice-cold phosphate-buffered saline (PBS) 3 times and then 

20 disrupted with 375 pi lysis buffer (0.5% NP-40, 20 mM Tris-HCl, 100 mM Nad, 
5 mM MgCl 2 , 1 mM dithiothreitol, and 1000 U of RNasin per ml) for 5 min on 
ice. This preparation was then gently centrifuged at 2000 rpm for 2 min. The 
pellet, which consists of nuclei, was resuspended in 200 pi of PBS for nuclear 
RNA isolation. The cytoplasmic enriched supernatant was centrifuged for 

25 another 2 min at 12,000 rpm to remove any contaminating nuclei. The 
supernatant was used for cytoplasmic RNA isolation. The RNA was then 
purified from the separated cytoplasm and nuclear fractions by using the same 
protocol as for total cellular RNA isolation. 
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RT-PCR And DNA-PCR Analysis. Before cDNA synthesis, all RNA isolates 
were tested for the presence of genomic DNA contamination by using p-actin 
primers to assure that there was no genomic DNA contamination in the RNA 
isolates, p-actin primers (ES31: 5'-CCCCAGCCATGTACGTCGCTATCC-3' [SEQ 
5 ID NO:30] and ES33: S'-GCCTCAGGGCAGCGGAACCGCTCA-S' [SEQ ID 

NO:31]) were prepared to amplify a 394 bp fragment p-actin expressed gene in 
the same PCR conditions as GC3 primers which are listed in Table 1. cDNA was 
synthesized from purified total RNA, nucleic RNA or cytoplasmic RNA at 42° C 
for 30 min in the presence of oligo d(T)16 primer with MuLV-reverse 

10 transcriptase by using RNA PCR kit (Perkin Elmer). PCR amplification (25 ^1) 
was performed in PCR buffer containing 0.2 |iM of each primer, 2.5 \il of the 
first-strand cDNA samples or 10-50 ng of DNA (for PCR), 200 \iM each of 
deoxynucleoside trisphosphate (dNTP) and 2.5 U of Platinum Taq DNA 
polymerase (Gibco). When the PCR products were used for sequencing 

15 purposes, reading proof PWO DNA polymerase (Roche) mixed with AmpliTaq 
DNA polymerase (Perkin Elmer) (1:5 ratio) was used. The touch down PCR 
was used to improve the specificity (Hastings and Krainer, 2001; Cullen, 1998). 
The conditions of touch down PCR were as follows: initial denaturation was 
( carried out at 94°C for 3 min, then followed by 10 cycles, each consisting of 

20 denaturation at 94° C for 1 min, annealing at 5° C higher than actual primer 
annealing temperature for 1 min, extension at 72°C for 1 min, and then 
followed by 25 cycles, characterized by denaturation at 94°C for 1 min, 
annealing at actual primer annealing temperature for 1 min, and extension at 
72° C for 1 min. A final extension was carried out at 72° C for 10 min. The 

25 amplified products were separated by electrophoresis on 2% agarose gels 

containing ethidium bromide in TAE buffer ( 40 mM Tris-acetate, 1 mM EDTA). 
The gel was photographed under UV light with Polaroid 677 film. The primers 
in PCR and RT-PCR reactions in our study were designed by Gene Runner 3.0 
(Hasting Software, Inc.) based on the ELF3 gene sequence in GenBank 

30 (AF110184) (SEQ ID NO:l) and listed in the Table 1. 
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DNA Sequencing. The DNA fragments from RDA, and the cDNA 
fragments from 5' RACE and 3 1 RACE were cloned in PCR-Script Amp SK(+) 
cloning vector by using the PCR-Script Amp Cloning Kit (Stratagene). Plasmids 
were purified by the Bio-Rad Plasmid Miniprep Kit, and sequenced by T3 and 

5 T7 primers in both directions. The DNA fragments from the PCR reaction were 
diluted 1:10 with dH 2 0 and sequenced with primers used in the PCR reaction* 
Sequencing was done at the North Shore University Hospital (New York) DNA 
Sequencing Facility using an ABI Prism 377 DNA Sequencer. Nucleotide and 
protein BLAST of the National Center for Biotechnology Information was used 

10 to searching for homologous sequences (Altschul et al., 1990; Gish and States, 
1993; Altschul et aL, 1997). 

Breast Cancer Tissue and Normal Tissue Samples: cDNA prepared from 
breast cancer biopsies and normal tissue from the same patient are described in 
Yuan et al. (1999) and provided by Dr. H.P. Xu. 

15 Results 

Establishment Of Human Breast Tumor And Matched Normal Cell Lines. 
Paired human breast cancer and normal cell lines were established from 
effusions of patients with breast cancer. After 8 months in culture, adherent 
cells (denoted K151) showed normal myofibroblast cell morphology with 

20 normal chromosomes in cytogenetic analysis. K151 non-adherent cells became 
partially adherent and showed morphologically malignant characteristics. 
Malignant cells revealed polyploidy. Cytogenetic analysis revealed two extra 
chromosome 1 copies, as well as numerous unassigned small chromosomal 
fragments. These cells expressed both EGP2 and K19, while the K151 

25 myofibroblast cell line only expressed K19. These two cell lines are referred to 
as the cancer cell line and the normal cell line in mRDA analysis. Using the 
same method, breast cancer cell lines denoted K234 and K259 were established 
and used for characterization of the DNA fragments isolated from modified RDA 
ofK151. 
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Isolarion Of A Highly Amplified DNA Sequence GC 3 From Human Breast 
Tumor Cell Lines Bv mRDA. The DNA isolated from K151 breast cancer cell 
lines (tester) and matched normal cell lines (driver) were cleaved with the 
Hpall enzyme and applied to the modified RDA protocol. After two rounds of 
5 DNA amplification/subtraction and PCR amplification, different products (DP2) 
were isolated from breast tumor cell lines. The gel purified DP2 fragments were 
cloned into pPCR-script AMP SK(+) cloning vector and amplified as described. 
Among 21 clones, 9 clones had different size DNA fragment inserts defined by 
restriction enzyme digestion. These were used as probes for amplicon Southern 

10 blotting. The clones which hybridized only to tester amplicon (cancer) and not 
to driver amplicon (normal) were sent for sequencing. The nucleotide BLAST 
search showed that two clones denoted GC2 and GC3, encoded the ELF3 gene 
7677-804S (368 bp) and 7514-8045 (531 bp) respectively (using the 
numbering system of SEQ ID NO:l). The nucleotide BLAST search against 

15 GenBank Human Expressed Sequences Tags Database (EST) revealed that 365 
bp of our GC3 is 98% homologous to a sequence tag of human cDNA (accession 
number BG960569) derived from the Human Cancer Genome Project and this 
sequence is located within intron 7 of the ELF 3 gene from nt 7514 to 7878. 
The DNA fragment of GC3 had CCGG on both ends (SEQ ID NO:ll). 

20 The 5' terminus is located in a CpG island within intron 7 and the 3* terminal 
extended to the 5* position at 35 bp of exon 8 of the ELF3 gene. We focused 
our attention on the larger GC3 DNA fragment. To confirm the difference 
observed in the tester and diver amplicons, genomic DNA Southern blotting was 
carried out by using GC3 DNA fragment as a probe to hybridize to tester and 

25 driver DNA. The same amount of genomic DNA digested by Hpall and Mspl 
from K151 cancer and matched normal cell lines was applied to Southern 
blotting. The GC3 DNA fragment only hybridized to the DNA from the breast 
cancer cell lines, but not to the DNA from the matched normal cell lines, 
whether HpoII or Mspl enzymes were used for digestion (FIG. 1). 



204024.1 



.8 tnW i> H> "Sn, ~ ,1 "3. s-* i-* £j 



-42- 



To detennine whether the GC3 DNA fragment exists exclusively in our 
breast cancer cell lines, a sensitive PGR technique was employed. Primers 
which amplify a 202 bp fragment from intron 7 of the ELF3 gene were 
synthesized based on the sequence derived from GC3. PCR was carried out on 
5 DNAs from 3 paired breast cancer and normal cell lines (K15 1, K234 and K259). 
-200 bp PCR products were produced both in breast cancer cell lines and 
normal cell lines (FIG. 2). The band appearing in the normal cells of K151 was 
considerably weaker than that of the cancer cell line (FIG. 2). The result 
showed that the GC3 DNA fragment in intron 7 of ELF3 selected by modified 
10 RDA was not uniquely present in the DNA of the cancer cell lines. Nonetheless 
this sequence does appear as a difference using the less sensitive Southern 
blotting and amplicon Southern blotting (FIG. 1). This difference thus appears 
to be due .to amplification of this gene product in the tester and not due to 
mutation within this gene. RDA can produce a difference this way when a DNA 
IS fragment is highly repeated or multiple copies are present in the tester in 
contrast to the driver (Lisitsyn et al., 1995). 

Retention Of GC3 In Cytoplasmic mRNA O f ELF3 Gene T n Human Breast 
Cancer Cells. RT-PCR was performed on the mRNA isolated from paired cell 
lines (K151 and K234) by using the same GC3 primers. The results showed that 
20 GC3 was expressed in the breast cancer cell lines but not in matched normal cell 
lines (FIG. 3). Sequence analysis of this 202 bp RT-PCR product showed 100% 
homology to the GC3 sequence denned by GC3 primers. cDNA from six paired 
human breast cancer and matched normal tissues, provided by Dr. H.P. Xu and 
prepared as described in Yuan et al. (1999), were also examined for expression 
25 of intron 7 with GC3 primers. GC3 was present in the mRNA of 4 of 6 breast 
cancer tissues, but not in normal tissue (FIG. 4). Expression of GC3 in breast 
cancer cell lines K151, K234 and most breast cancer tissues indicates that intron 
retention occurs in many breast cancer cells. In order to exclude RT-PCR 
products that might have resulted from amplification of contaminating genomic 
30 DNA in the preparation of RNA, differential DNase I and RNase digestion was 
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perforraed on the total RNA preparation from K151 cancer cell line before cDNA 
synthesis. The RT-PCR product showed that GC3 and p-actin was generated in 
the RNA isolated only after DNase I treatment, but not in the sample after 
RNase digestion (FIG. 5). This confirmed that GC3 was retained in the RNA 
" fraction of the cells and was not there as a result of genomic DNA 
contamination in our RNA preparation prior to reverse transcription. To 
elucidate whether GC3 is retained in the cytoplasmic mRNA of the breast cancer 
cells, RNAs were purified from nuclear and cytoplasmic fractions prepared from 
the K151 and the MCF-7 human breast cancer cell line, from the human ' 
histiocytic lymphoma cell line (U-937) and the human T cell leukemia cell line 
(Jurkat). cDNA was prepared from these RNAs, and p-actin and GC3 primers 
were used to detect normal exonic p-actin and abnormal intronic GC3. The 
same GC3 and P-actin primers were used on the RNA prepared prior to 
preparation of the cDNA from these cells to rule out any genomic DNA 
contamination prior to reverse transcription. In an RT-PCR reaction, -200 bp 
GC3 products were produced in the nuclear and cytoplasmic RNA of both the 
K151 and MCF7 breast cancer cell lines (FIG. 6). GC3 was also weakly 
produced in the nuclear RNA but not in the cytoplasmic RNA of the U-937 cell 
line. There was no GC3 RT-PCR product in either the nuclear or cytoplasmic 
RNA from the Jurkat cell line. No GC3 or p-actin amplification occurred in any 
nuclear or cytoplasmic RNA samples prior to the reverse transcription step, 
excluding any genomic DNA contamination in the RNA isolates. The positive p- 
actin results in the RT-PCR reaction demonstrated the integrity of the RNA and 
assurecLthat equal amounts of RNA were present in each sample (FIG. 6). 
mRNA was further purified from all cytoplasmic and nuclear RNA extracts by 
oligo (dT) 20 coated magnetic beads. This mRNA was then subjected to RT-PCR 
and the cDNA was tested with GC3 and p-actin primers. The same results were 
obtained with this method of RNA purification. GC3 amplification was seen 
only in the breast cancer ceU lines K15 1 and MCF7 but not in U937 and Jurkat 
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cell lines (data not shown). These results confirmed that GC3 is retained in the 
cytoplasmic mRNA of human breast cancer cells. 

Presence Of Partially Unspliced ELF3 mR NA Sequence In Human Breast 
Cancer Cell Lines. To verify the RT-PCR results and determine that GC3 is 
5 retained in cytoplasmic mRNA of breast cancer cells as part of intron 7 of ELF3, 
the RACE technique was used to determine the cDNA sequence of ELF3 gene. 
RNA was extracted from K151 and K239 cell lines as described above. After 
RNA extraction, RNA preparations were screened to assure the absence of 
genomic DNA contamination using PCR amplification with GC3 primers as 

10 shown in FIG. 5A. After establishing the 3* and 5* RACE cDNA libraries, GC3 
was confirmed to be present in these libraries using the GC3 primer (FIG. 7). 
When GC3 UPF (SEQ ID NO:18) and GC3 UPN (SEQ ID NO:19) were used as 
the first primer and nested primer, respectively, in our 5* RACE experiments, an 
—1000 bp DNA fragment and an -300 bp DNA fragment were produced in the 

15 K151 cDNA library, and -400 bp and —100 bp DNA fragments were produced 
in the K259 cDNA 5' RACE library (FIG. 8). The -1000 bp DNA fragment from 
the K151 5' RACE was gel purified and cloned. All nine positive plasmids 
containing this DNA fragment were selected. Three of these were sequenced. 
The sequence from 2 of the 3 sequenced positive plasmids showed 100% 

20 homology to 1002 bp of the ELF3 genomic DNA sequence (AF1 10184) from 
6550 to 7551 (SEQ ID NO:12) which contains the entire intron 4, 5, and 6 and 
71 bp from the 5' end of intron 7 (FIG. 9). All intron/exon splice junction 
borders conform with the splice site consensus G/GT...C/AG rule without any 
single nucleotide mutation. The third sequenced clone had 100% homology to 

25 the normal cDNA sequence of ELF3 which contains exon 1 to exon 7 without 
any intron retention. 

When GC3 DF(S) (SEQ ID NO:20) and GC3 DN(S) (SEQ ID NO:21) 
were used as the first primer and nested primer in the 3' RACE experiments, an 
-1000 bp DNA fragment was produced in both K151 and K259 cDNA 3* RACE 

30 libraries (FIG. 8). The product from K151 was gel purified and cloned. 
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Sequence analysis revealed all the sequences had normal cDNA of ELF3 which 
contained properly spliced exon 8 and exon 9, 3' UTR and a polyA tail. In order 
to demonstrate GC3 (as part of intron 7) retention in the ELF3 mRNA, 5' RACE 
was pursued with GC3 primers. The sequence analysis showed homology to the 
5 ELF3 genomic sequence from 7270 to 8198, which contained the entire intron 
7. The sequencing results indicated GC3 was retained as part of intron 7 of 
ELF3 in the mRNA pool. Additionally, introns 4, 5, 6 and 7 were retained in • 
their entirety in the ELF3 mRNA from breast cancer cell line K151. The 5' RACE 
and 3' RACE results from the breast cancer cell line K151 is summarized in FIG. 
10 9. 

Presence of Normal ELF3 mRNA In Human Breast Cancer Cells The 
fully spliced mRNA from our breast cancer cells provided herein as SEQ ID 
NO:2 is 1959 bp and is predicted to encode a 371 amino acid protein (SEQ ID 
NO:4), which shares 98% homology to the ELF3 protein sequence. Even 

15 though the coding sequence (CDS) was 98% homologous to the published 
cDNA sequence of the ELF3 gene (Oettgen et al., 1999; Oettgen et al., 1997a; 
Brembeck et al., 2000; Iisitsyn et al., 1995), the 5' UTR was different and was 
derived from 4876 to 5006 instead of 4777 to 4888 of the ELF 3 genomic DNA 
sequence (SEQ ID NO:l). The presence of fully spliced mRNA of the ELF3 gene 

20 in our breast cancer cells was further confirmed by the sequence analysis of RT- 
PCR products, in which the PCR reaction was performed on K151 and K259 
cDNA libraries prepared for the 5' RACE (FIG. 10). Primers were chosen which 
spanned intronic areas (Table 1 -GC3(1-3)S and AS (SEQ ID NO:22 and 23); 
GC3(3-6)S and AS (SEQ ID N0:24 and 25); GC3(6-8) S and AS (SEQ ID NO:26 

25 and 27); GC3(8-9)S and AS (SEQ ID NO:28 and 29). The fully spliced exon 1, 
2, 3 (343 bp), exon 3, 4, 5, 6 (460 bp), exon 6, 7, 8 (369 bp) and exon 8, 9 
(409 bp) were amplified with four different pairs of primers, indicative of 
appropriate splicing of introns in these products. The result indicates that fully 
spliced mRNA of ELF3 constitute much of the ELF3 mRNA. The sequence 

30 analysis reveals normal splicing of all 8 introns from mRNA of ELF3. The RT- 
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PCR and cDNA sequence analysis indicated that both partially unspliced ELF3 
mRNA which contains intron 4, 5, 6 and 7 and fully spliced normal ELF3 mRNA 
are present in human breast cancer cell lines (FIG. 10). 
Discussion 

5 Malignant breast cancer effusions were used to obtain normal and cancer 

cell lines from the same patient in order to find genetic differences between the 
autologous cell lines. An mRDA technique using the malignant cell lines as a 
tester and the normal cell lines as a driver was utilized. A 531 bp DNA 
fragment, denoted GC3 (SEQ ID NO:ll), positioned at 75 14-8045 in intron 7 

10 and exon 8 of the ELF3 gene, was obtained as a difference. As the GC3 

sequence was normal, amplification of GC3 was felt to be responsible for the 
difference, as RDA can detect small restriction fragments with different 
sequences, but can also detect amplified sequences that are enriched by kinetic 
factors' and cannot be completely subtracted by the driver (Lisitsyn et al., 1995). 

15 Cytogenetic analysis of the malignant line K151 used in the procedure revealed 
two extra copies of chromosome 1, the site of ELF3. Fluorescence in situ 
hybridization (FISH) has shown ELF3 amplification in the SK-BR-3 (5 to 6 
copies) and BT-474 (4 copies) breast cancer cell lines, which results 
predominantly from an increase in chromosome lq number (Chang et al., 

20 1997). 

As GC3 was assumed to be upregulated in the malignant clone, 
expression of this area was sought and found by RT-PCR. GC3, as part of intron 
7 of ELF3 gene, was retained in the ELF3 cytoplasmic mRNA transcript in breast 
cancer cell lines and most breast cancer tissues (4 of 6) but not matched normal 
25 cell lines and tissues. Great care was taken to exclude DNA contamination as an 
artifactual cause of the findings. The 5 ! and 3 1 RACE procedures were used to 
confirm GC3 intron sequences in ELF3 cytoplasmic mRNA. These procedures 
further showed that there was retention of introns 4, 5, and 6 in mRNA, along 
with fully spliced 1959 bp normal transcripts of mRNA. 
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This is the first time that transcripts of ELF3 with multiple introns were 
found to be retained in cytoplasmic ELF3 mRNA in breast cancer. Intron 
retention in breast cancer cell lines and breast cancer tissue has also not been 
previously described. Clearly this is a pathological process and distinguishes 
5 breast cancer cells from nonnal cells. These findings indicate that abnormal 
mRNA processing is involved. Aberrant mRNA processing may take place by a 
variety of mechanisms, and may cause appropriate effects as well as 
pathological states. Exon skipping, abnormal splice site selection, and full 
intron selection have been described (Stutz and Rosbach, 1993; Krug, 1993; 

10 Stella et al., 2001; Hellwinkel et aL, 2001; Beghini et al., 2000). Intron 

sequences have been shown to have motifs which can alter gene expression by 
influencing transcription rate (Matsumoto et al., 1998). Introns may code for 
independent proteins (Krug, 1993), may extend the coding sequence of an 
adjoining exon, or may provide alternate translation termination signals 

15 (Beghini et al., 2000). The appearance of introns in cytoplasmic mRNA is 
unusual in eukaryotic cells, though physiologic alternate splicing provides a 
mechanism for expanding protein expression (Hide et al., 2001). Splice site 
mutation may slow or prevent intron removal, but these incompletely spliced 
mRNAs are not transported into the cytoplasm (Stutz and Rosbash, 1998; Krug, 

20 1993). Export of mRNA through the nuclear membrane usually requires 
splicing of all introns (Darnell et al., 1997; Cramer et al., 2001). 

A database of aberrant splicing in mammalian genetic disorders has 
shown that genomic mutation with resultant intron retention is relatively rare 
(Nakai and Sakamoto, 1994). A nonsense mutation causing exon skipping and 

25 intron retention of LKB1/STK1 1, a Peutz-Jeghers syndrome gene, may 
contribute to tumorigenesis in a small fraction of malignant melanomas 
(Guldberg et al., 1999). Intron retention of non-mutated ELF3 (Intron 4, 5, 6, 
7) in breast cancer cells and tissue containing multiple normal stop codons 
excludes alternate splicing as a cause. 

i 
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Intxon retention associated with cancer cells is seen with the CD44 gene. 
Intron 9 and intron 18 of the CD44 gene are retained in the cytoplasmic mRNA 
transcripts in tumors . CD44 is known to be composed of at least 20 exons, ten 
or more of which can be alternatively spliced to produce various isoforms 
5 (Cooper, 1995; Matsumura et al., 1999; Goodison et aL, 1998; Yoshida et aL, 
1995). 

While intron retention appears rare in cancer cells it is commonly used 
by viruses to make more proteins from a simple nucleic acid organization. In 
HIV-1, the rev protein is able to bind to the rev response element and prevent 

10 the splicing out of introns, allowing full transcripts of the HIV UNA to enter the 
cytoplasm. It protects the viral RNA from intron splicing and helps bind the 
mRNA to the nucleopore for external transport of unspliced mRNA to the 
cytoplasm (Cullen, 1998; Flint et al., 2000; Favaro and Arrigo, 1997). In 
herpes simplex 1, the protein ICP27 acts like Rev to make the cellular gene for 

15 a-globin appear in an unspliced fashion in the cytoplasm. ICP27 may act after 
pre mRNA to prevent degradation of some intron-containing fragments and 
then help those fragments out of the nucleus through an alternative nuclear 
export pathway (Cheung et al., 2000? Ellison et al., 2000). 

Some viruses have been speculated to cause human breast cancer, 

20 including a retrovirus (Ketdar et al., 1984; Moore et al., 1971; Wang et al., 
1998; Wang et al., 1995; Pogo et al., 1997; Al-Sumidaie, 1988), a polyoma 
(Fluck et al., 1996) and a herpes virus (Bonnet et al., 1999). One could 
speculate that the ELF3 intron retention could be caused by some viral product 
which acts indirectly on the ELF3 gene similar to the way ICP27 acts on the a- 

25 globin gene. The appearance of intron retention of the ELF3 gene could thus be 
used to search for a potential viral protein which may result in breast cancer. 

Example 2. Cytoplasmic Intron Retention And A New Alu Element In The mRNA 
Of The ELF3 Gene In Peripheral Blood Mononuclear Cells From Patients With 
Breast Cancer 

204024.1 
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* Example Summary 

Example 1 describes the retention of intron 7 of the ELF3 gene in 
cytoplasmic mRNA in breast cancer tissue and breast cancer cell lines but not in 
autologous normal breast epithelial cells. That finding, along with retention of 
5 introns 4, 5 and 6 of ELF3 and expression of fully spliced ELF3 mRNA was 

demonstrated using reverse transcriptase PCR (RT-PCR) and by 5'- and 3'- rapid 
amplification of cDNA ends (RACE). As described in this Example, downstream 
genomic DNA walking from intron 7 of ELF3 led to the discovery of a new Alu 
element, termed Alu^ (SEQ ID NO: 13), which was found inserted in an 

10 antisense orientation between nt 8762 and nt 8763 of the ELF3 gene (SEQ ID 
NO:l). This Alu^ was found to be retained in the cytoplasmic mRNA as a 
fragment of intron 8 in breast cancer tissues and cell lines similar to intron 7. In 
order to see if Alu^,, and intron 7 retention occurred in other cells than breast 
epithelium, peripheral blood mononuclear cells (PBMCs) from breast cancer 

15 patients were tested for these gene fragments in the total RNA from these 

PBMCs. Great care was taken to assure that there was no contamination of the 
RNA with genomic DNA prior to creation of cDNA libraries. PBMCs from 13 of 
28 patients with ductal carcinoma in situ (DCIS) with or without invasion were 
found to have intron 7 retention while 10 of 28 had Alu^ retention. All 

20 patients with Alu^ had concomitant intron 7 retention. Three of 25 patients 
without DCIS but with invasive duct cancer or invasive lobular cancer had 
intron 7 and/or Alu^ retention. Only 2/20 PBMCs from normal patients had 
intron 7 retention while 0/20 normals had Alu,^,, retention. The association of 
retention of intron 7 and/or of Alu^with DCIS was highly statistically 

25 significant (p value= 0.008) using the Chi square test. The presence of intron 
retention of this epithelium-specific mRNA within PBMCs has not been 
previously shown. The cause of this unusual intron retention in these cells is 
not known, but this finding is useful in understanding the pathogenesis of DCIS, 
and as the basis for an assay to distinguish DCIS from other forms of breast 
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cancer. A better understanding of the biology of ELF3 might provide a new 
target for developing better chemotherapy for breast cancer. 
Introduction 

In this Example, ELF3 gene walking upstream of intron 7 led to the 
5 discovery of a previously undescribed Alu element inserted within another Alu 
element in a reverse orientation within intron 8 of the ELF3 gene. This Alu, 
designated Alu^, is also found retained in cytoplasmic mRNA in breast cancer 
cells and breast cancer tissue along with the retention of a fragment of intron 7 
which we designate as GC3. These phenomena were explored further using 

10 normal cells from breast cancer patients to determine whether there is a general 
error in ELF3 splicing, and to determine whether this Alu kwd might be linked to 
the cytoplasmic intron retention discussed in Example 1. Accordingly, we chose 
to study peripheral blood mononuclear cells from breast cancer patients to 
determine whether there might be some global defect in splicing of ELF3 in 

15 otherwise normal cells from these patients. 

This investigation resulted in the finding of intron retention of Alu,^ 
along with the GC3 fragment of intron 7 in cytoplasmic mRNA in PBMCs from 
women whose breast cancer pathology indicated the presence of ductal 
carcinoma in situ (DCIS), with or without invasive carcinoma. This aberrant 

20 retention of Alu and intron sequences was seen infrequently in most normal 
patients without breast cancer, and in other forms of breast cancer in which 
DCIS was not seen pathologically. The association of intron retention in PBMCs 
from DCIS patients has not been previously described. This particular form of 
breast pathology (i.e., DCIS) appears to be a major precursor in the 

25 development of invasive ductal carcinoma. The finding of ELF3 gene expression 
in PBMCs is also a novel finding for this gene that heretofore was believed to be 
expressed only in epithelial cells and not in lymphoid tissue (Tymms et al., 
1997; Chang et al., 1997; Andreoli et al., 1997; Choi et al., 1998; Chang et al., 
1999; Oettgen et al., 1999; Oettgen et al., 1997a; Brembeck et al., 2000; Chang 

30 et al., 2000). 
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Materials and Methods 

Human tumor cell lines. Human breast cancer and matched normal cells 
lines (K151, K234 and K259) were established in our laboratory as described in 
Example 1, and maintained with 20% FBS-1640 media in T75 flasks at 37°C in 
5 a 5% C0 2 atmosphere. MCF-7 (human breast cancer), U-937, (human 
histiocytic lymphoma), Jurkat (human T cell leukemia) and C33-A (human 
cervical cancer) cell lines were obtained from the American Type and Tissue 
Culture Collection (ATCC) and routinely maintained in RPMI 1640 (GIBCO- 
BRL) supplemented with 10% FBS at 37° C in a 5% C0 2 atmosphere. 

10 Genomic DNA Walking. DNA was isolated from cells using the QIAamp 

DNA blood kit (Qiagen Inc.). The Universal GenomeWalker kit (Clontech 
Laboratories, Inc.) was used for genomic DNA walking based on the instructions 
provided by the manufacturer. Briefly, genomic DNA was digested by .Dral, 
EcoRV, PvuU and Stul overnight and ligated with the adaptor from the kit. The 

15 uncloned, adaptor-ligated genomic DNA fragments were used as genomic- 
walker libraries for polymerase chain reaction (PCR) amplification. Primary 
PCR used the outer adaptor primer provided in the kit (API) coupled with 
either sense (GC3 DF)(SEQ ID NO:20) or antisense (GC3UPF)(SEQ ID NO:18) 
primers derived from known sequences for downstream and upstream walking 

20 respectively. The primary PCR mixture was then diluted and used as a.template 
for nested PCR with a nested adaptor primer from the kit (AP2) combined with 
either nested sense (GC3DN)(SEQ ID NO:21) or antisense (GC3 UPN)(SEQ ID 
NO: 19) primers. The GC3 DD (SEQ ID NO:32) primer was used for further 
down-stream walking in the first and nested PCR reaction. The sequences of 

23 the primers are listed in Table 1, in Example 1). Each of the DNA fragments 
that begin in a known sequence at the 5' end of antisense primers (upstream 
walking) or the 3' end of sense primers (downstream walking) and which 
extend into the unknown adjacent genomic DNA were cloned and sequenced as 
described below. 
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Sequencing and GenBank searching. The DNA fragments from genomic 
walking were gel purified by using the Wizard PCR preps DNA purification 
system (Promega Corp.) and cloned in pPCR-script Amp SK(+) cloning vector 
by using the PCR-Script Amp Cloning Kit (Stratagene). Plasmids were purified 
5 using the Plasmid Miniprep Kit (Bio-Rad Laboratories), and sequenced by T3 
and T7 primers in both directions. For PCR product sequencing, the DNA 
fragments from the PCR reaction were diluted 1:10 with distilled H 2 Q and 
sequenced with primers used in the PCR reaction. The sequencing was done at 
the North Shore University Research Institute (New York) DNA Sequencing 
10 Facility using an ABI Prism 377 DNA Sequencer. Nucleotide BLAST of the s 
National Center for Biotechnoloy Information was used for searching for 
homologous sequences (Altschul et al., 1990; Gish and States, 1993; Altschul et 
al., 1997). 

RNA extraction. In this study, all RNA extraction was carried out with 

15 the High Pure RNA isolation kit (Roche, Indianapolis, IN) according to the 
manufacturer's protocol. Any co-purified DNA was ultimately digested with 
DNase L All RNA isolates were tested for genomic DNA contamination by PCR 
amplification before reverse transcription to cDNA. Isolation of nucleic RNA 
and cytoplasmic RNA was performed according to a basic protocol (Ausubel, 

20 1995) with slight modification. Briefly, freshly prepared cell pellets were 

suspended in 200 \il of lysis buffer containing the nonionic detergent P-40 for 5 
minutes on ice. The lysates were centrifuged at 2000 rpm/min to separate a 
cytoplasmic fraction (supernatant) and a nuclear fraction (cell pellet). The 
supernatant containing the cytoplasmic extract was transferred to a fresh tube. 

25 The pellet, which consisted of nuclei, was resuspended in 200 |il of PBS buffer 
for nuclear RNA isolation. The supernatant was used for cytoplasmic RNA 
isolation after further centrifugation for 2 min at 12,000 rpm to further remove 
any contaminating nuclei. The RNA from the separated cytoplasm and nuclei 
were prepared by using the same protocol as total cellular RNA isolation. 

30 RNAase and DNAase I digestion analyses were performed to assure that there 
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was no DNA contamination of RNA isolates prior to conversion to cDNA. 
Approximately 500 ng of total RNA was digested with either 5 fig of RNAase or 
200 U of RNAase-free Dnase I (Roche, Indianapolis, IN) at 37°C for 20 min. 
After incubation, RNAase or DNAase I was inactivated by incubation at 70°C for 
5 10 min. The reaction mixtures were subjected to the same procedure as RNA 
isolation. 

Preparation of cDNA bv RT-PCR and PCR. cDNA was synthesized from 
purified total RNA at 42°C in the presence of oligo d(T)16 with MuLV-reverse 
transcriptase with the PerMn Elmer RNA PCR kit. Eight (8) paired cDNAs from 

10 breast cancer tissues and matched normal breast tissues were prepared as 
described in Example 1. PCR amplifications (25 ul) were performed in PCR 
buffer containing 0.2 uM of each primer, 2.5 ul of the first-strand cDNA samples 
or 10-50 ng of DNA (for PCR), 200 uM of each dNTP and 1 U of Platinum Taq 
DNA polymerase (Gibco). When the PCR products were used for sequencing, 

15 reading proof PWO DNA polymerase (Roche) mixed with AmpliTaq DNA 
polymerase (Perkin Elmer) (1:5 ratio) was used. Primers GC3 S (SEQ ID 
NO: 16) and GC3 AS (SEQ ID NO: 17) were used to amplify 202 bp of intron 7 
of ELF3; primers INSE-S (SEQ ID NO:33) and INSE-AS (SEQ ID NO:34) were 
used to amplify a 451 bp sequence of intron 8 of ELF3 if Alu^ is inserted, or a 

20 136 bp DNA fragment if Alu kwd is not inserted (Table 1). Touch down PCR was 
used in PCR reactions to improve the specificity (Don et al., 1995; Roux, 1995). 
The conditions of touch down PCR for GC3 and p-actin amplification were as 
follows: Initial denaturation at 94°C for 3 min followed by 10 cycles each of 
denaturation for 1 min at 94°C, primer annealing for 1 min at 62°C and 

25 extension for 1 min at 72°C, followed by 25 cycles of denaturation for 1 min at 
94°C, primer annealing for 1 min at 57°C, extension for 1 min at 72°C.and then 
final extension for 10 min at 72°C. For Alu^,, amplification the annealing 
temperature was at 64° for 10 cycles and 62° for the following 25 cycles. The 
amplified products were separated by electrophoresis on 1.5% agarose gels 
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containing ethidium bromide in TAE buffer ( 40 mM Tris-acetate, 1 mM EDTA). 
The gel was photographed under UV light with Polaroid 677 film. 

Clinical Material. After informed consent, whole blood was collected in 
EDTA tubes from breast cancer patients at North Shore Hematology/Oncology 
5 Associates (New York), a general medical oncology group practice. They were 
selected only by a diagnosis of breast cancer and willingness to consent to this 
study. The patient charts were retrospectively reviewed for pathological 
reports, staging, and demographic information. All clinical information was 
obtained without knowledge of the laboratory findings. PBMCs were isolated 
10 from whole blood by Ficoll-metrizoate (Lymphoprep, Nyegard, Oslo) density 
gradient centrifugation. Cell pellets were preserved at -70°C for DNA and RNA 
isolation. PBMCs from 20 unknown blood donors were purchased 
commercially. 
Results 

15 Antisense insertion of a unique 315 bp Alu element within intron 8 of 

the ELF3 gene. We have shown that a fragment (GC3) (SEQ ID NO:ll) of 
intron 7 of the ELF3 gene appeared as a difference in representational 
difference analysis (RDA) performed on a breast cancer cell line (tester) and a 
normal cell line (driver) prepared from the same neoplastic breast cancer 

20 effusion (Example 1). More importantly, intron 7 (GC3) was shown to be 
retained in the cytoplasmic ELF3 mENA which was demonstrated by RT-PCR, 
and confirmed by cDNA sequencing. In order to search for any mutation or 
insertion near the intron 7 area which may have contributed to the retention of 
this intron in mRNA, genomic walking libraries were constructed from genomic 

25 DNA isolated from K151 breast cancer cell lines as described in Materials and 
Methods. Primers were designed based on the GC3 DNA sequence derived from 
K151 breast cancer cell lines for up-stream walking (GC3 UPF [SEQ ID NO:18] 
and GC3 UPN [SEQ ID NO:19]) and down-stream walking (GC3 DF [SEQ ID 
NO:20] and GC3 DN [SEQ ID NO:21]) (Table 1). DNA fragments from StuI 

30 and PvuII digested genomic walking libraries were produced for upstream 
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walking. The sequence analysis of the 529 bp DNA fragment from the Stul 
library showed 98% homology to the ELF3 gene from nt 7022 to nt 75 1 1 . The 
659 bp DNA fragment from the PvuII library showed 94% homology to the ELF3 
gene nt 6892 to 7511. In the downstream walking library, a DNA fragment 
5 from the Dral library was predominant. The sequence revealed this to be a 629 
bp DNA fragment with 96% homology to the ELF3 gene from nt 8003 to nt 
8632. The next 40 bp sequence began with an A-enriched region, without 
homology to the ELF3 gene. To define this area more extensively, further 
downstream walking was carried out by using a primer (GC3 DD) (SEQ ID 

10 NO:32 located in nt 8569 to 8597 of ELF3. Another ~950bp DNA fragment 
was produced in the Stul library by this further downstream walking. Sequence 
analysis revealed that this DNA fragment contained the sequence from nt 8569 
to 9228 of the ELF3 gene. However, there was an antisense insertion of a 
unique 315 bp Alu element (SEQ ID NO:13) (designated Alu^) within intron 8 

15 between nucleotides 8762 and 8763 of the ELF3 gene which does not exist in 
the published ELF3 gene sequence deposited by Chang et aL (AF1 10184) (SEQ 
ID NO:l). This insertion occurs at the end of a 121 bp Alu region just after a 17 
bp repeat from nt 8745 to nt 8762 (Appendix, under SEQ ID NO:13). This 
insertion is within intron 8 of the ELF3 gene, an area important for the ets 

20 transcription regulation function of this gene (Tymms et al., 1997; Chang et aL, 
1997). The Alu^ sequence provided as SEQ ID NO:13 is only 85% 
homologous to any known Alu sequences deposited in GenBank. The genomic 
walking results and the Alu^ insertion site results are summarized in FIG. 11. 
To determine whether the antisense Alu^ element insertion also exists 

25 in other breast cancer cell lines, breast cancer tissues or normal cells, another 
pair of primers (EMSE-S [SEQ ID NO:33] and INSE-AS [SEQ ID NO:34]) were 
designed, which amplify a 451 bp DNA fragment in intron 8 of ELF3 where 
Alu^was found, flanked by normal intron 8 sequences, as shown in Appendix, 
under SEQ ID NO: 14. PCR analysis was carried out using these primers on the 

30 DNA from breast cancer cell lines K15 1, K234 and K259, on the matched CD3 + 

204024.1 



-56- 

T lymphocytes derived from K234, and on nonnal donor PBMCs. This 451 bp 
DNA fragment was produced in all the tested samples. A —140 bp DNA 
fragment was also observed, especially in the DNA isolated from K151 cancer 
cells CFIG. 12). This result suggests that Alu^,, is present both in breast cancer 
5 tissue and cultured cells from breast cancer patients, as well as in their normal 
PBMCs. DNA sequence analysis from the 451 bp PCR products reveals 100% 
homology to the sequence derived from genomic DNA walking, in which the 
315 bp antisense Alu^ sequence was inserted between nt 8672 and 8673 of 
the ELF3 gene. There was no difference in the DNA sequence found in the 
10 breast cancer cells, matched normal cells and PBMCs. The - 140 bp DNA 
fragment seen in the K151 cancer cells and some other samples indicated the 
presence of the ELF3 genomic DNA without the Alu^,,, insertion, suggesting 
heterozygosity in these patient's ELF3 gene with one gene product missing the 
antisense Alu^, 

15 Retention of Ahh^. in ELF 3 mRNA in breast cancer cells A cDNA library 

was constructed from breast cancer cell lines and normal cell lines as previously 
described. This library was screened with primers made from the same Alu 
primers as in the PCR reaction to see if Alu^was expressed in these cells in a 
fashion similar to GC3 (intron 7) described in Example 1. We included a cDNA 

20 library from the well-studied human breast cancer cell line MCF-7 cell. The 
results are shown in FIG. 13. Alu^ expression was present only in 4 breast 
cancer cell lines (K151, K234, K259 and MCF-7) but not in matched normal cell 
lines. 

Contamination with genomic DNA during RNA isolation may have 
25 resulted in contamination of our cDNA libraries. Such DNA would be amplified 
in the highly sensitive RT-PCR technique we used in our study. In order to 
exclude the possibility that the PCR products might result from amplification of 
contaminating genomic DNA in our RNA isolates, DNAase I and RNAase 
digestion was performed on the total RNA preparation from the K151 cancer 
30 cell line before cDNA synthesis by MuLV reverse transcriptase. The purified 
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RNA after digestion was reverse transcribed to cDNA. In these studies, p-actin 
and Alu^ amplifications were not detected in the RNAase digested RNA 
sample, but were present in the RNA sample after DNAase I treatment. This 
indicates that Alu^ expression in the breast cancer cell lines was not the result 
5 of genomic DNA contamination and that Alu^ was retained in mRNA isolates 
from breast cancer cell lines. We also tested for the presence of Alu^ using the 
AhWd primers in 8 paired cDNAs prepared from human breast cancer tissue and 
matched normal tissues. The result is shown in FIG. 16. The 451 bp Alu^ 
containing DNA fragment was produced in 5 of 8 breast cancer tissues (62.5%), 

10 but was not found in the matched normal tissues even though p-actin was 
expressed equally in all tissues. 

To verify that Alu^ is retained in the cytoplasmic mRNA, RNA was • 
purified from isolated nuclear and cytoplasmic fractions of K151, K259 and 
MCF-7 human breast cancer cell lines as described in Materials and Methods. 

15 Human histiocytic lymphoma cell line U-937, human T cell leukemia cell line 
Jurkat, and human cervical carcinoma cell line C-33A were similarly analyzed. 
RT-PCR results showed that the — 451bp Alu^-containing PCR product was 
generated in the cytoplasmic and nuclear RNA of K151, K259 and MCF7, but 
was present only in the nuclear RNA from C33-A and U937. No PCR product 

20 was produced in either the nuclear or cytoplasm RNA from Jurkat (FIG. 17B). 
The same amount of RNA prepared after DNAse digestion was not subjected to 
reverse transcription but was instead tested by PCR using the Alu primers. 
There was no amplification of the 451 bp DNA fragment in either the nuclear or 
cytoplasmic fraction (FIG. 17C). These results indicate that the 45 lbpAlu^- 

25 containing intron 8 fragment is retained in the cytoplasmic mRNA of human 
breast cancer cell lines K151, K259 and MCF7 and is not due to genomic 
contamination of RNA prior to preparation of cDNA. The 393 bp p-actin DNA 
could be found in all cDNA samples by RT PCR, demonstrating the integrity of 
the RNA and showing that similar amounts of RNA were present in each sample 

30 CFIG. 17A). 
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Assodation of intron 7 and intron 8 Alu lnir d retention of ELF3 in PBMCs 
from patients with du ctal carcinoma in situ fDCISV As retention of intron 7 
and intron 8 Alu^ appeared to be exclusively in breast cancer tissues and 
cancer cell lines, we explored the possibility that these findings might be 
5 extrapolated to the peripheral blood, as a useful marker for breast cancer. 
cDNA libraries were prepared from peripheral blood mononuclear cells 
(PBMCs) and these libraries were screened for GC3 [intron 7] retention and for 
Aluiwrt retention* The pathological diagnoses of these patients were unavailable 
during the analysis of the samples for intron retention. RNA was extracted from 

10 these cells as described above, and cDNA libraries were prepared. All RNA 

isolates were tested for genomic DNA contamination using the GC3 and p-actin 
primers. Only one sample demonstrated genomic contamination and it was not 
used in our analysis. Commercially purchased lymphocytes from normal 
healthy adult donors were similarly analyzed. The cDNA libraries from these 

15 cells were tested using both the GC3 primers for analysis of intron 7 retention 
and the Alu primers for intron 8 Alu kwd retention. 

After analysis, charts and records were reviewed to determine the type of 
breast cancer present and to determine the stage of disease. Pathology reports 
were used to determine the type of cancer and were read by different 

20 pathologists at the time of biopsy and independent of this study. These reports 
indicated that many specimens were from patients with ductal carcinoma in situ 
(DCIS) either alone or in the presence of invasive ductal carcinoma (DCIS+/- 
IDC). Invasive ductal carcinoma (EDC) was sometimes reported without 
mentioning DCIS. Some patients had lobular carcinoma (ILC) with or without 

25 lobular carcinoma in situ (LCIS), and/or DCIS + /-IDC. In 2 patients adequate 
pathological descriptions could not be found and these samples were not used. 

Representative gels are shown in FIG. 15, and a summary of the results is 
presented in Table 2. In patients whose report indicated the presence of DCIS 
with or without other forms of invasive cancer, intron 8 Alu^ retention was 

30 seen in 10/27 (37%) while it was present in only 3/25 (12%) patients who did 
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not have a pathological description of DCIS. This difference was statistically 
significant at piO.Ol by the chi square test. Mu^ retention was not seen in 
any of the 20 normal blood donors CTable 2). The same samples when screened 
for GC3 retention showed this intronto be retained in 13/27 (48%) of DCIS+/- 
5 IDC while it was present in only 3/25 (12%) cancers without a description of 
DCIS. This difference was statistically significant psO.Ol. GC3 retention was 
present in on 2/20 normal PBMCs but these bands were very faint with 
insufficient DNA to adequately sequence to be certain these represented GC3 
DNA. The association of Alu^,, and/or GC3 with DCIS+ IDC was statistically 
10 different from controls (psO.01). All patients showing Alu^,, retention also 
showed GC3 retention. 
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Table 2: Summary of clinical results 







Breast cancer with 


Breast cancer 


Normal 






DQS related 


with 


donors 






subtype 


non-DCIS related 


(n= 20) 






(n=21) 


Subtype 










(n=28) 






GC3 Retention (%) 


13/27 (48.15%) 


3/25 (12%) 


2/20 (10%) 




P value 


vs. non-DCIS <0.01 










vs. normal <0.01 






5 


Alu retention ( %) 


10/27 (37.04%) 


3/25 (12%) 


0/20 ( 0% ) 




P value 


vs. non-DQS =0.05 










vs. normal <0.05 







The Effect of Addi tion of Breast Cancer Cells to PBMCs on the Detection 
of ELF3 intron 7 fGC3Y The presence of ELF3 expression in the form of intron 

10 retention could be the result of circulating breast cancer cells in the peripheral 
blood which were detected by our methodology. In order to understand the 
sensitivity of our detection system, we added 10 fold-increasing concentrations 
of GC3 expressing K259 breast cancer cells, from 1 cell up to lxlO 6 cells, into 
2x10* PBMCs that did not demonstrate GC3 or Alu^ retention. RNA was 

15 extracted from each dilution and 2 \il RNA (between 1-3 ng) was used for cDNA 
synthesis using methods described. These dilutions were tested for the presence 
of GC3 using GC3 primers which amplify 202 bp intron 7 of ELF3. As shown in 
FIG. 19, the correct PCR product was visible with a dilution of 1.0x1 0 6 to 
l.OxlO 3 per 2xl0 6 PBMCs indicating an ability to detect at least 1 cancer cell in 

20 2000 normal PBMCs. Many of the PBMCs which were tested for GC3 were from 
women who have been in remission from breast cancer for many years and/or 
from women who have been on therapy but were not considered to have active 
metastatic disease. This suggests that the presence of intron retention of GC3 
or Alu^ is not due to circulating breast cancer cells but due to some more 

25 basic abnormality, detectable in the PBMCs of women with breast cancer. 
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Discussion 

Using cells, tissues, and cell lines from breast cancer patients, and 
applying gene walking technology, a unique novel Alu element in the ELF3 gene 
has been found. The Alu, dubbed Alu^, is inserted in a reverse orientation into 
another Alu within intron 8 between positions 8762 and 8763. Two forms of 
intron 8 DNA exist in our cancer cell lines. One contains Alu kwd and another 
without this element, indicating heterozygosity of the ELF3 gene. 

Alu^ appears in cDNA in both human breast cancer cell lines and breast 
cancer tissue specimens. The presence of unspliced mRNA containing Alu^ in 
the cytoplasm of the neoplastic cell lines is not due to genomic contamination of 
RNA prior to creation of cDNA libraries. The Alu kwd is also not found in normal 
breast epithelial cells or in a limited number of malignant cells from non-breast 
derived cell lines. Strikingly, PBMCs from 35.7% of breast cancer patients with 
DCIS, with or without invasion, express Alu^ in their PBMCs. A fragment of 
intron 7 of the ELF3 gene, previously designated GC3, is similarly retained in 
the cytoplasm of 46.4% of the PBMCs from breast cancer patients with DCIS 
with or without invasion. 

Alu elements are ubiquitous in the human genome, which contains 
500,000 to 1,000,000 copies representing 5-10% of the total DNA. They can 
insert themselves into the genome by using "borrowed" reverse transcriptase 
(Schmid, 2000). They are generally not found within the coding region but 
have been found in introns and occasionally in non-translated regions of mRNA 
(Szmulewicz et al., 1998). Previously thought to be "junk DNA" derived from 
inactivated sequences, Alu cDNAs can insert themselves into genes where they 
can interfere with, or alter gene function, by interacting with promoters or 
enhancers as well as introns and exons. They have been shown to induce 
alternate splicing in some families with BRCA1 mutation. 

It is unclear if Alu kwd interferes with splicing. Alu elements are generally 
spliced out of the final forms of mRNA. Finding retained Alu^ in cytoplasmic 
mRNA of breast cancer cells and tissue, along with the previously described GC3 
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fragment of intron 7, evidence of a gross splicing defect is present in the ELF3 
gene in breast cancer. The retention of introns 4, 5, 6, 7 and the Alu element in 
intron 8 also favors this assumption. This defect is not present in all breast 
cancer cell lines or tissue. 
5 The expression of ELF3 is generally thought to occur only in epithelial 

cells. We have shown however that we can find unspliced mRNA of ELF3 which 
includes GC3 (intron 7) and the Alu^ element within intron 8 in the PBMCs of 
patients with breast cancer, especially in those with DCIS with or without 
invasion, as opposed to all other diagnoses. It was not present in PBMCs from 

10 most normals or in patients whose pathological reports did not indicate DCIS. 
Its presence is apparently not due to circulating metastatic cancer cells, as most 
patients were in remission so it is unlikely that they had > 1/2000 cancer 
cells/normal PBMCs, which is the limit of detection of cancer cells with 
abnormal intron retention in our system. This is evidence of an important ELF3 

15 splicing error related to breast cancer. The ELF3 gene appears to be important 
in DCIS and may be associated with regulation of HER2/neu (Chang et al., 
1997). 

The presence of intron retention in the PBMCs of a certain cohort of 
cancer patients is consistent with a global splicing error in some patients with 

20 breast cancer, and may be due to some hidden viral element that interferes with 
splicing. If a putative virus is responsible in some way for breast cancer, it could 
be searched for using intron retention or Alu^ as a marker for its presence, 
similar to the way reverse transcriptase was used as a marker to find the HTLV1 
virus (Poiesz et al., 1980) and HIV-1 (Gallo et al., 1984). These findings open 

25 up a different approach to the epidemiology of breast cancer and provide new 
useful tools for the study of this disease. 



Example 3. Viral induction of ELF3 mRNA intron retention and Alu^. 

As established in the previous Examples, ELF3 intron 7 (GC3) and intron 
8 (Alu) retention was only observed in certain breast cancer cells and tissues as 
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well as in peripheral blood mononuclear cells (PBMCs) from about 50% of-DCIS 
breast cancer patients. The hypothesis that a virus, specifically a retrovirus or a 
herpesvirus, may be involved in the cause of breast cancer has been proposed 
for a long time. To date no clear cut virus has been discovered although some 
have tried to implicate mouse mammary tumor virus and possibly EBV as a 
cause of human breast cancer. Therefore we evaluated whether virus infection 
could induce ELF3 intron 7 (GC3) retention in a cell line. Establishment of the 
induction of ELF3 intron retention by viral infection would establish that viral 
presence, particularly a virus associated with breast cancer, can be investigated 
by evaluating whether ELF3 introns are retained in mRNA. To this end we 
performed the following experiments. , 

RT-PCR was performed on UNA extracted from PBMCs of 8 HIV-1 
infected patients, PBMCs of 1 HTLV-1 infected patient, and from 1 HTLV-1 
infected T cell line. GC3 expression was not observed in any of these RNAs 
from these retrovirally-infected cells. 

We next evaluated whether infection with any of 7 human herpesviruses 
could induce intron retention, by RT-PCR analysis of GC3 expression in RNA 
preparations from infected cells. Herpes simplex virus I (HSV1), herpes simplex 
virus II (HSV2), Varicella zoster virus (VZV), Epstein Barr virus (EBV), 
cytomegalovirus (CMV), human herpes virus 6 (HHV6) and Human herpes virus 
8 (HV-8) were the herpes viruses tested. Separate cultures of a MRC5 cell line 
were each inoculated with a laboratory strain of HSV1, HSV2, VZV and CMV. 
When the cytopathic effect (CPE) characteristic for each virus appeared, the 
infected MCR5 cells were collected, and cell pellets were kept at 90°C. EBV- 
transfected B cell lines, HHV6-infected cell line HSB2, and HHV8-positive cells 
from Kaposi's sarcoma cells were also used for this study. Uninfected MRC5 cell 
lines and HSB2 cell lines were used as normal controls. PCR of the RNAs 
without reverse transcriptase using GC3 primers, as in the previous examples, 
was performed to rule out DNA contamination. 
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In these studies, GC3 expression appeared only in the RNA extracted 
from EBV-infected cell lines. To confirm that EBV infection can induce ELF3 
intron 7 (GC3) retention, further experiments were performed using EBV strain 
B95-8 (obtained from ATCC). This strain was used to infect the BJAB cells. 
5 BJAB is an EBV-negative B cell line that is also negative for intron retention of 
GC3. The cell pellets were prepared from EBV-infected BJAB cells at day 2, day 
4, day 7, day 9, day 11 and day 14 after infection. BJAB without EBV infection 
was used as a control. 

ELF3 intron 7 retention was produced on all EBV-infected BJAB cells 

10 from day 2 to day 14. There was no ELF3 intron 7 retention demonstrated in 
normal BJAB cell lines without EBV infection. These results indicate that EBV 
infection can induce ELF3 intron 7 retention in infected cell lines. This would 
suggest that an EBV-like virus or even EBV itself might play some role in the 
production of breast cancer. We have demonstrated that the cell lines described 

15 in the previous Examples that are derived from breast cancers do not show 

evidence of EBV infection when tested with appropriate EBV PGR primers. We 
thus believe that a novel virus may play some role in breast cancer and induce 
intron retention. 

20 In view of the above, it will be seen that the several advantages of the 

invention are achieved and other advantages attained. 

As various changes could be made in the above methods and 

compositions without departing from the scope of the invention, it is intended 

that all matter contained in the above description and shown in the 
25 accompanying drawings shall be interpreted as illustrative and not in a limiting 

sense. 

All references cited in this specification are hereby incorporated by 
reference. The discussion of the references herein is intended merely to 
summarize the assertions made by the authors and no admission is made that 
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any reference constitutes prior art. Applicants reserve the right to challenge the 
accuracy and pertinence of the cited references. 
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Appendix - SEQ ID NO:s 

SEP ID NO:l and SEP TP NO;3 . From GenBank Accession No. AF1 10184. 
SEQ ID NP:1 - ELF3 gene (annotated) - AF110184 and 
SEP ID NP:3 - human ELF3 amino acid sequence alternative 1. 
5 LPCUS AF1 10184 10772 bp DNA linear PRI 22-JUL-1999 

DEFINITION Homo sapiens epithelium-restricted Ets protein ESX gene, • 
complete cds. 
ACCESSION AF110184 
VERSICN AF110184.1 GI:5565858 
10 SOURCE human. 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 4802 to 9870) 
15 AUTHORS Chang.CH., Scott,G.K., Kuo,W.L., Xiong.X, Suzdaltseva,Y., 
Park,J W., Sayre,P., Erny.K., Collins.C, Gray,J.W. and Benz,C.C. 
TITLE ESX: a structurally unique Ets overexpressed early during human 

breast tumorigenesis 
JOURNAL Oncogene 14 (13), 1617-1622 (1997) 
20 MEDLINE 97275260 
PUBMED 9129154 
REFERENCE 2 (bases 1 to 10772) 
AUTHORS Chang,C.H., Scott,G.K., Baldwin,MA. and Benz,C.C. 
TITLE Exon 4-encoded acidic domain in the epithelium-restricted Ets 
25 factor, ESX confers potent transactivating capacity and binds to 

TATA-binding protein (TBP) 
JOURNAL Oncogene 18 (25), 3682-3695 (1999) 
MEDLINE 99318560 
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PUBMED 10391676 
REFERENCE 3 (bases 1 to 10772) 
AUTHORS Chang,C.H., Scott,G.K. and Benz,C.C. 
TITLE Direct Submission 
5 JOURNAL Submitted (30-NOV-1998) Hematology/Oncology, U.C.S.F., 505 
Parnassus Ave., San Francisco, CA 94143-1270, USA 
FEATURES Location/Qualifiers 
source 1..10772 

/organism = "Homo sapiens" 
10 /db_xref="taxon:9606" 
/chromos ome = " 1 " 
/map= ,, lq32" 
miscfeature 34..622 

/note="similar to THC 213038" 
15 repeatregion 921..1524 
/rpt_family="Alu" 
/ rpt_type = dispers ed 
repeat region 2978..3293 
/rpt_family="Alu" 
20 /rptjype=dispersed 
CAAT_signal 4697..4702 

/evidence=not_experimental 
TATA_signal 4735..4736 

/evidence=not_experimental 
25 mRNA join(4777..4888,5311..5481,6139..6360,6526..66l8, 

6822..6941,7129..7218,7364..7480,8011..8206,9076..9872) 
/product= "epithelium-restricted Ets protein ESX" 
5' UTR join(4777..4888,5311..5318) 
exon 4777..4888 
30 /number=l 
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miscfeature 4785..4901 

/note="putative CpG island" 
exon 5311..5481 

/number =2 

CDS join(5319..5481,6139..6360,6526..6618,6822..6941, 
7129..7218,7364..7480,8011..8206,9076..9190) 
/note="epitheUal-restricted with serine box; Homo 
sapiens ESX cDNA ORF presented in GenBank Accession 
Number U66894" 
/codon_start=l 

/produa="epithelium-restricted Ets protein ESX" 
/proteinid = "AAD45 237.1" 
/db_xref = "GI:55658S9" 
/translation (SEQ ID NO:3)= 

"MAATCEISNIFSNYFSAMYSSEDSTIASWPAATFGADDLVLTLSNPQMSLEGTEKA 

SWLGEQPQFWSKTQ\^WISYQVEKNKYDASAIDFSRCDMDGATLCNCALEELRL 

WGPLGDQIJiAQIJU^LTSSSSDELSWlIELLEKDGMAFQEALDPGPFDQGSPFAQEL 

LDDGQQASPYHPGSCGAGAPSPGSSDVSTAGTGASRSSHSSDSGGSDVDLDPTDG 

KIJTSDGFPJ)CKKGDPKHGKRKRGRPRKLSKEYWDCI£G^ 

DILIHPEIJ^EGIJVIKWENRHEGVFKFTJISFAVAQLW 

YYKREILERVDGRRLVYKFGKNSSGWKEEEVLQSRN' 1 
repeat_region 5773..6059 

/rpt_family="Alu" 

/rpt_type= dispersed 
exon 6139..6360 

/number =3 
exon 6526..6618 

/number =4 
exon 6822..6941 

/number =5 
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exon 7129-7218 

/number=6 
exon 7364..7480 
/number=7 
5 misc_feature 7401. .7525 

/note="putative CpG island" 
exon 8011. .8206 

/number =8 
repeatregion 8655..877S 
10 /rpt_family="Alu" 

/rpt_type=dispersed 
exon 9076..9872 

/number=9 
3'UTR 9191..9872 
15 polyA_signal 9845..9850 

/evidence =not_experimental 
miscfeature complement(9952..10387) 
/note="similarto THC 209689" 
misc_feature 10358..10772 
20 /note="sinular to THC 203540" 

BASE COUNT 2486 a 2843 c 2985 g 2458 1 
ORIGIN 

1 aagcttctta ggcatgtgta tgtgtgtttc ttgcagggga agcagaagta tacacttccg 
61 ctgtaccacg caatgatggg tggcagtgag gtggcccaga ccctcgccaa ggagactttt 
25 121 gcatccaccg cctcccagct ccacagcaat gttgtcaact atgtccagca gatcgtggca 

r 

181 cccaagggca gttagaggct cgtgtgcatg gcccctgcct cttcaggctc tccaggcttt 
241 cagaataatt gtttgttccc aaattcctgt tccctgatca acttcctgga gtttatatcc 
301 cctcaggata atctattctc tagcttaggt atctgtgact cttgggcctc tgctctggtg 
361 ggaacttact tctctatagc ccactgagcc ccgagacaga gaacctgccc acagctctcc 
30 421 ccgctacagg ctgcaggcac tgcagggcag cgggtattct cctccccacc taagtctctg 
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10 



15 



20 



25 



30 



/ 

481 ggaagaagtg gagaggactg atgctcttct tttttctctt tctgtccttt ttcttgctga 
541 ttttatgcaa agggctggca ttctgattgt tcttttttca ggtttaatcc ttattttaat 
601 aaagttttca agcaaaaatt aagttacgga ttgagtgact attaaatttc ttccaccaga 
661 ggtcctcact gtgtttgttc aggaaaggtc actgggggag gcccagagaa tgacagtatt 
721 ttcctgtcct cagggaacag ccagggtgaa ggaggtgggt gtcctacaca tgcatatgaa 
781 aaaaaatatg gcaaaatggc acagctggtg caggaaaatg aaaaaggaat agcattccag 
841 ttctccgtga agcagctgaa ttctctatct gcagcagcat tcccattatc ttttccatca 
901 ctaagaaaaa aaaatgggct gggcacggtg gctcatgcct gtaatcccag cactttggga 
961 ggctgaggcg agaggatcgc ttgagcccag gagtttgaga ccaccctggc caacatagca 
1021 ggacttcatc tctaccaaaa aaaaaaaaaa aaaaaaaaaa aagccaggcg tggtggctca 
081 cgcctgtaat ctcaacactt tgggaggctg aggcaggcaa atcacttgag gtcagaagtt 
141 tgagaccagc atggccaaca tggtgaaacc ccatctctac tgaaaaaaaa gatagatgca 
201 aaaattagcc aggcatggtg gctcacacct gtagttccag ttacttggga ggctgaagca 
.261 ggagaaacac ttgaacctgg gaggtggagg ttgcagtaaa ctgagatcat gccactgcac 
.321 tccagcctgg gtgacagagt aagacttctc aaaaaaaaaa aaaaaaagct gggcgtggtg 
381 gtgcattcct gtggtttcag ctactcagga ggctgaggca ggaggatcac ttgagcccaa 
441 gaggtcaagg ccacagtgag tcttgattgt gccactgaac tccagcctga gtgacagagt 
501 gagaccctgt ctcaaaaata aaaataaagt gtcttatgac tttttatcta cccttctgcc 
.561 catgcccaag gcttcactgg gcctcacctg tctttgatcc tagataacta tttgaatggt 
.621 aatcaagtaa agtctttaga acttagcact aaattctgat ttcctggcct caacatgggg 
681 acctaaacag ttagcaatct gggtttggga gtgggatgag gggagggttg gaagaaatat 
741 ttagtgtgtt tcatttgcct ttcttaaata cagggcaccc ctgaaacagg ctttgttcgc 
801 agctctgctc tgtcctcgga tttaggttat cgaacaggct tcctccctcc cctgcacaag 
861 ggttgggaat gagtcgattt gctttcactc agcaagagca agggactagt ggtgaccaag 
921 tggtagactg gagaggcctc tgccccgtgg cacacagctc caccatcaga gagggtgatg 
*981 tgggtcatag gtgagggatc tggaggcccg gtatcggaag agcttctcca ggcactggca 
2041 ttttgacagc aaactgcttc cgtggctctt tcaggactgt tcctgggcaa tatgttattg 
2101 gcaaggacta ttttagggct atccagttgt ctccccctct ccccaacctt ttatctagct 
2161 tatcagtagc tatctttcct tgctctgtac aaaaacctat agcaccaata ggcccagtaa 
2221 tcatgaaggg tcagtgcaag gaaaggctgg aagcccttcc tctaacagcc gtgctgtgac 
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2281 tccactaact ttgtggggtc tcccattaca tagcgtgggt atcctgagct gtgcagcctg 
2341 cctcactcac caccttggta cctgacagga ctactggatg tgcctgtcct tttgtaggac 
2401 attctcccat cccaaagatg aggctgtgct gccgtgtggg caagctctgt ggggagaggg 
2461 gaggccagtg ggttgttttt gccatcacag aatactggga agcccctggc atcctgctcc 
5 2521 atagctctct tcaccactat cctggaacct tctccccacc cccatcccca tgcctccaag 
2581 gcactgacct caaatccaag tctttctcac ttatctcaag ctgccagcct gtagggattc 
2641 cttatctcag ctccatgtca gcggtgagga agccccaaga aggcaaggga gctgacagcc 
2701 ttctcatttt tctcgtacat cctcctgttc accccgccat cccgggagcc ccagccagat 
2761 gctcttcagg gcagggagca cgtgagcagc cctggggcta gaagccggtt ctcccacatt 

10 2821 cctgggtgag ggactgggtg gagggtgtgc ctgcctcagg ctccttgggg gaggccccct 
2881 gaagggctgg ggaaaatcct actgagcccc aggctctcct gcctgcactg gcccagtgcg 
2941 ggggcggggg ggcgggggga tcctacattt caaatgcata aaaatctaga tatgggctgg 
3001 gcgcagtagc tcatgcctgt aatcccagca ctttgggagg ctgaggcagg cagatcatga 
3061 ggtcaggaga tcgagtccat cctggctaac atggtgaaac cccgtctcta ctaaaaatac 

15 3121 agaaagccgg gcatggcagc gggcgcctgt aatcccagct actcggaaag ctgaggcagg 
3181 agaatcgctt gaacccacga gtcagaggtt gcagtgagca gatatcacgc cactgcactc 
3241 caacctgggc gacagagcga gactccacct caaaacaaaa taaaccaaat actagatctg 
3301 gaagagatct tagggattat taaattcaga caacctcatt ttttatagat ggggaaacaa 
3361 gcacagactc caagggtctc atccaagatc acacagttgc agatgctggc tacaagtctc 

20 3421 ctgcctcaac cacctgtatt accccattca gggtctcaag aagggtctat aagacactat 
3481 ccattgtgtt tcgggctgag tccatagaga caaccacaga catgggggac tctgcccaca 
3541 gggaaggcaa gggctctggc catggagctg gatgggaaga ctctgaagcc cgaagacatt 
3601 gaatcctgtg cagggaaaga gcgagggttt tgtgtacaac acacctgcat acctggatgt 
3661 gaatctcagc tccacccctt caccaactct gtgtggcctg ggcaagccat tctaagggaa 

25 3721 ccctccacac tgcaactttc atgtctataa aatgggaata accatgcatt ccttacagga 
3781 cttttttggt gtgaggatta aatgagagaa tatgttgaaa agtgcttggt aaatatatta 
3841 atactatgca ttccctcttc tttgaatgac gtgacccagg tagtcaggct tctgaccact 
3901 agagggcagc agaaggtact ggaaaactgg gccgagtgaa ccagagatta gatggggtcc 
3961 agagagcagg gatgaactta cccgtgtgga ttctggcaac tccggcaggg agggctccag 

30 4021 caggcgctga gggaagaact ttcaagcaga gccgggtctc ttcaggagcg actgcagcaa 
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4081 ccctgatgct tggatggagt ccaggcaggt gatggtagtg aagaccttgc caacagagtg 
4141 ggcgctggag aaggagccct ttagtgggga ccctggggcc acgactaggc tggcaggccc 
4201 agccagcacc aattaatcca tgagtattgc ccagcattga gcctggagca ccttccagcc 
4261 cctggccaga gtcctgggtg ttctgggaaa aacccctaaa cctagtaact cctctcccta 
5 4321 ctaggcctct ttgttgctga atctctggaa tttaggggcc agcagctttc tgactcaggt 
4381 cagccagggg ttcatgttce ctcacttgcc ctccccctgc ctggcccatc tctggcctgg 
4441 cccctgggag gaatttcctg ggccagaggg cagccgaaag cacagatgcc caccccagca 
4501 acgttcccgc cacctgccca ggccagtgcc ccgtgcccaa ccccagaggg tgcgggatga 
4561 cagactctga caatcattaa accagccggg cctgatttcc cagcactgcc tgctaagatc 

10 4621 cgggccaagt ggcactgaat atgcaaatca cctggggcca ggagcccagt ctaaaggcca 
4681 ggaaatcccc tccatccaat gagacaccag ctcaggttac tgcaggggac acactataaa 
4741 gccctgagct cagggaggag ctccctccag gctcta START mRNA> ttta 
gagccgggta ggggagcgca 

4801 gcggccagat acctcagcgc tacctggcgg aactggattt ctctcccgcc tgccggcctg 

15 4861 cctgccacag ccggactccg ccactccg INTRONl> gt aggattcccc gcctgtcatt 
ccctagccca 

4921 gctcttggga aactgcagag gggtccagag gatttgcagt tctgaacctg cacactccag 
4981 tctaggatct ccgagcaaga gcgtaggtgt cctgagggtc aaagaacaga gagagattgt 
5041 ctctgggaag gcagaatggc catgacgccg ctagtctggc tccagggccc cagagatctg 

20 5101 aggagggaag cccagctgga ggctcctgtg gtcctgccct ggtctgagat cttggagccc 
5161 ttcttgaaga gacggtgtcc gcagagttgc tgatcttcct gcccctgggg gctactcttg 
5221 cccagggttg ggcaaagcag agtagctggg agtgtaagga gaggaccctc gtcccctcac 
5281 caacctcatc ctctctcccc ctacccacag EXON2> gtagcctc START CDS> at 
ggctgcaacc tgtgagatta 

25 5341 gcaacatttt tagcaactac ttcagtgcga tgtacagctc ggaggactcc accctggcct 
5401 ctgttccccc tgctgccacc tttggggccg atgacttggt actgaccctg agcaaccccc 
5461 agatgtcatt ggagggtacaINTRON2> ggtgggtctc agcggggtgg gatggggcac 
ggagtgggag 

5521 acagatccat ctaagggcct gttagacaaa tgggggaata ggcagggagg agggtctcta 
30 5581 ggcaaattcc agggctagag gctgagactt agtgactgag gtgctggggg ttgtggggct 
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5641 gtgacaggca gagggaggtg tcagatacca ggacaagggt gttgtgaatg ctacctcctg 
5701 cccctactct tgggatggct ccaagggctg aggtgtgaat ccccagtgtg ctccaggaat 
5761 ggggctgtgt gggctgggag tggtggctca cgcctgtaat cccagcactt tgggaggctg 
5821 agctgagcgg atcacctgag gtcaagagtt cgagaccagc ctagccaaca tggtgaaacc 
5 5881 ccgtctctac taaaaataca aaaaaaaatt tatcccagcg tggtggtggg cacctataat 
5941 cccagctact ggggaggctg acgcaggagt atcgcttgaa cctgggaggt ggaggttgct 
6001 gtgagccgag attgtgccat tgcaccccag cctaggtgac aggagtgaga ctccatctca 
6061 aaaaaaaaaa aaaaaatggg gctgtaaggt ctgctgggtg gcctgagctg agcctgtttc 
6121 cctgcctggc ccttgcag EXON3> ag aaggccagct ggttggggga acagccccag 
10 ttctggtcga 

6181 agacgcaggt tctggactgg atcagctacc aagtggagaa gaacaagtac gacgcaagcg 
6241 ccattgactt ctcacgatgt gacatggatg gcgccaccct ctgcaattgt gcccttgagg 
6301 agctgcgtct ggtctttggg cctctggggg accaactcca tgcccagctg cgagacctca 
6361 INTRON3> gtgagtccag gcccctggag gctggggagc agctccacat gttgagctga 
15 gtcgagttca 

6421 gtgtggccgt aggcaggccc tggagctctg ggccagctgc acagccagag agagcccttg 
6481 agggagggat taggggagtg tgacccttcc ttccttcctt gtcag EXON4> cttcc 
agctcttctg 

6541 atgagctcag ttggatcatt gagctgctgg agaaggatgg catggccttc caggaggccc 
20 6601 tagacccagg gccctttg INTRON4> gt gagaacccgt tttctccttc cttccccagc 
ctgtcttgtc 

6661 ccatccctgc ccctccacag agtgctagag atgaccccct ccccagactt cttcctccct 
6721 caattagaaa aattgcagca ggtcatcaga cccatgggca gcatcacctg tcctggtctg 
6781 gtcccctgag ccctctctga gttctcacct cctcttccca g EXON5> accagggca 
25 gcccctttgc 

6841 ccaggagctg ctggacgacg gtcagcaagc cagcccctac caccccggca gctgtggcgc 
6901 aggagccccc tcccctggca gctctgacgt ctccaccgca g INTRON5> gtgagagct 
ctctctgggc 

6961 cacaacctcc cttccccgaa gtgtcccttg ttccctctgg ctcccagcac cataactcag 
30 7021 gccttctggc aggaacagga acaggctggg aagtgtgtcc tgagagccag cagcgtggtt 
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7081 gaacagaagg tgggccggca ggggacttac tctgaccccg ccccccag EXON6> gg 
actggtgctt 

7141 ctcggagctc ccactcctca gactccggtg gaagtgacgt ggacctggat cccactgatg 
7201 gcaagctctt ccccagcg INTRON6> gt gagtcgaggg aggtccccaa gagggcgtcc 
5 catttagcaa 1 

7261 tgcacagggg gcccggctct tcctgcagcc ttttcctgta gaggggctac tctccctaac 
7321 tcccctcttg cccctccttg accttccacc accgtcccca cag EXON7> atggttt 
tcgtgactgc 

7381 aagaaggggg atcccaagca cgggaagcgg aaacgaggcc ggccccgaaa gctgagcaaa 
10 7441 gagtactggg actgtctcga gggcaagaag agcaagcacg INTRON7> gtgagctccg 
ggggcacgtg 

7501 ggtcctccct gcgccgggct gagcggcttc ctggggcact gcgggttgtt gcaggtatcc 
7561 cttctcccgt tttctctggc ctccgcatgg cctttggtaa ggctgtgcac aagctggggg 
7621 ctctatggta tcggtcacca cctaattgca gagcctggct tggtggtcct ggagaggagg 

15 7681 aggaaataag gctcccagtg ggaggctcat ggtaccagag tcctgtccac tgactccagt 
7741 gtcctgtcca ctgactccag ttctctctgc acttggccac tgtcctgccc tctgggacac 
7801 cctcaatgtg aggaggcagc tggtgggtct taggtgggct gaggagaaaa gcagtcactg 
7861 cagtacccgc acagagggca ctgcggggtc tctggagagg cttgctgcat gctgtggcca 
7921 agtcagcagt gcactggggc gggcagggct ggctggcctt gggtgagagg ggacacctgg 

20 7981 atggcaaact gatggaggct ggccttgcag EXON8> cgcccagagg cacccacctg 
tgggagttca 

8041 tccgggacat cctcatccac ccggagctca acgagggcct catgaagtgg gagaatcggc 
8101 atgaaggcgt cttcaagttc ctgcgctccg aggctgtggc ccaactatgg ggccaaaaga 
8161 aaaagaacag caacatgacc tacgagaagc tgagccgggc catgag INTRON8> gtga 
25 gctggcggcc 

8221 aggaccctca cgatacagcc ggacatgggg acaggcgctc acactcccac cgccctcttt 
8281 ctggctgcca cttggtttct tgcaacaggg ctgagtcctt agagtgagga caacatctgg 
8341 gttggtctac ttcafcggatt aaatgacaac atggagaaag tattagcctg gcagacagca 
8401 gacacagtgc acttgagcta gcagcaacat ttcttgtatc gcctgtgagg cttgtcctca 
30 8461 ggaaggcacc tggagagtgg gaaagggggc aggagccgtg cccacccagg gcctggcttt 
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8521 ctcctcgttg aagcacttag gttgtttttc tctgggcctc agtttcctcc tgtgtccagg 
8581 agtacactag atcatcttaa gatcccgtcc agccctaaaa tcatgtactt actttttttt 
8641 tctttttctt ttttaaatag aggcaagggt ctctacgttg gccaggccgg tctcaaactc 
8701 ctggcctcaa atgactctcc tgcctcggcc tctcaaagtg ctgggattac aggtgtgagc 
8761 caccgtgccc agctccctgg ccttaaaagt catgtaattt aatgatcaga ccccagtcac 
8821 agccatagga tacaaagaag caaaggcaaa gagccctgtg tcctgggcac ggttacaggc 
8881 cagtgtaggg aaagagcttc tgcttgccag tgtgaagaac agaggagttt aggaagtgtg 
8941 agtcaggctc agcttagtca ggcagagacc agtgggcatg ggttacctgg gggtaacgcg 
9001 ggccaggtgg gcgggctggc agcctggggc ccatttcctg ccaaagcacc tctgaccatc 
9061 cttctcttca cccag EXON9> gtact actacaaacg ggagatcctg gaacgggtgg 
atggccggcg 

9121 actcgtctac aagtttggca aaaactcaag cggctggaag gaggaagagg ttctccagag 
9181 tcggaactga END CDS gggttggaac tatacccggg accaaactca cggaccactc 
gaggcctgca 

9241 aaccttcctg ggaggacagg caggccagat ggcccctcca ctggggaatg ctcccagctg 
9301 tgctgtggag agaagctgat gttttggtgt attgtcagcc atcgtcctgg gactcggaga 
9361 ctatggcctc gcctccccac cctcctcttg gaattacaag ccctggggtt tgaagctgac 
9421 tttatagctg caagtgtatc tccttttatc tggtgcctcc tcaaacccag tctcagacac 
9481 taaatgcaga caacaccttc ctcctgcaga cacctggact gagccaagga ggcctgggga 
9541 ggccctaggg gagcaccgtg atggagagga cagagcaggg gctccagcac cttctttctg 
9601 gactggcgtt cacctccctg ctcagtgctt gggctccacg ggcaggggtc agagcactcc 
9661 ctaatttatg tgctatataa atatgtcaga tgtacataga gatctatttt ttctaaaaca 
9721 ttcccctccc cactcctctc ccacagagtg ctggactgtt ccaggccctc cagtgggctg 
9781 atgctgggac ccttaggatg gggctcccag ctcctttctc ctgtgaatgg aggcagagac 
9841 ctccaataaa gtgccttctg ggctttttct a END mRNA acctttgtc ttagctacct 
gtgtactgaa 

9901 atttgggcct ttggatcgaa tatggtcaag aggttggagg ggaggaaaat gaaggtctac 
9961 caggctgagg gtgagggcaa aggctgacga agaggggagt tacagatttc ctgtagcagg 
10021 tgtgggctta cagacacatg gactgggctg ggaggcgagc aaaggaagca gctgagactg 
10081 ttggagaacg cttacaagac ttcatgcaag caaggacatg aactcagaac actgaggtca 
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10141 gaagcatcct gctgtcatga caccgctcga gtgaccttga ccttgaccaa gtctgtcctg 
10201 tttaggactg atttttccta ttaggctagg gtttggacct gatgttctca agatgtctag 
10261 aattgcatgg ctggccttgt ggaatagatg gttttgcatt ccagccaagt gtgctgtaaa 
10321 ctgtatatct gtaatatgaa tcccagcttt tgagtctgac aaaatcagag ttaggatctt 
5 10381 gtaaaggaaa aaaaaaaaaa caaaacaaaa tggagatgag tacttgctga 
gaaagaatga 

10441 gggaaggagt tggcatttgt tgaaagtata gtctttttct clllllull taattgcaac 
10501 ttttacttta gatttaggag gtcgtgcgca ggtttgttac atgggtatat tgtgtgatgc 
10561 tgagcttggg atgcgaatga tcctgtcacc caggtagtga gtatagcacc cagtgaaact 
10 10621 gtagtctcat gccaggcact gtgctagccc actctggctc atttaatcct ctcctaagaa 

10681 gagaggagac acagcgtccc catttgacag atgcagaaag aggttccaca ggtgtgcctt 
10741 gattctgtcc taaaaccgtt tcccggaagc tt 

// 

SEP ID NO:2 - ELF3 cDNA and 
15 SEP ID NO:4 - ELF3 amino acid sequence alternative 2 

1959 bp full length of spliced mRNA of ELF3 gene in breast tumor cell lines and 
predicted amino acid sequence of ELF3 gene. The adenosine at the atg start 
codon is considered the number one nucleotide. 
-135 

20 ctccgccactccggtaggattccccgcctgtcattccctagcccagctcttgggaaac 
tgcagaggggtccagaggatttgcagttctgaacctgcacactccagtctaggatctc 
cgagcaagagcgtagcctc 

1 atggctacaacctgtgagattagcaacatttttagcaactacttc 
MATTCBISNIFSNYF 
25 46 agtgcgatgtacagctcggaggactccaccctggcctctgttccc 

SAMYSSEDSTLASVP 
91 cctgctgccacctttggggccgatgacttggtactgaccctgagc 
PAATFGADDLVLTLS 
13 6 aacccccagatgtcattggagggtacagagaaggccagctggt tg 
30 NPQMSLEGTEKASWL 

18 1 ggggaacagccccagttctggttgaagacgcaggt tctggactgg 
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GEQPQ PWLKTQVLDW 
226 atcagctaccaagtggagaagaacaagtacgacgcaagcgccatt 

ISYQVEKNKYDASAI 
271 gacttctcacgatgtgacatggatggcgccaccctctgcaattgt 

DFSRCDMDGATLCNC 
316 gcccttgaggagctgcgtctggtctttgggcctctgggggaccaa 

ALEBLRLVPGPLGDQ 
361 ctccatgcccagttgcgagacctcacttccagctcttcttatgag 

LHAQLRDLTSSSSYE 
406 ctcagttggatcattgagctgctggagaaggatggcatggccttc 

LSWI IELLEKDGMAF 
451 caggaggccctagacccagggccctttgaccagggcagccccttt 

QEALDPGPFDQGSPF 
496 gcccaggagctgctggacgacggtcagcaagccagcccctaccac 

AQELLDDGQQASPYH 
541 cccggcagttgtggcgcaggagccccctcccccggcagctctgac 

PGSCGAGAPSPGSSD 
586 gtctccaccgcagggactggtgcttctcggagctcccactcctca 

VSTAGTGASRSSHSS 
631 gactccggtggaagtgacgtggacctggatcccactgatggcaag 

DSGGSDVDLDPTDGK 
676 ctcttccccagcgatggttttcgtgactgcaagaagggggatccc 

LFPSDGFRDCKKGDP 
721 aagc acgggaag cggaaa cgaggc cggc c c cgaaagc t gage aaa 

KHGKRKRGRPRKLSK 
766 gagtgctgggactgtctcgagggcaagaagagcaagcacgcgccc 

ECWDCLEGKKSKHAP 
811 agaggcacccacctgtgggagttcatccgggacatcctcatccac 

RGTHLWEFIRDILIH 
856 ccggagctcaacgagggcctcatgaagtgggagaatcgacatgaa 

PELNEGLMKWENRHE 
901 ggcgtcttcaagttcctgcgctccgaggctgtggcccaactatgg 

GVFKFLRSEAVAQLW 
946 ggccaaaagaaaaagaacagcaacatgacc t acgagaagc tgagc 
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GQKKKNSNMTYEKLS 
991 cgggccatgaggtactactacaaacgggagatcfctggaacgggtg 
RAMRYYYKRE I L ERV 
1036 gatggccggcgactcgtctacaagtttggcaaaaactcaagcggc 
5 DGRRLVYKFGKNSSG 
1081 tggaaggaggaagaggttctccagagtcggaactga 1116 
WKEEEVLQSRN* 
gggttggaactatacccgggaccaaactcacggaccactcgaggcctgc 
aaaccttcctgggaggacaggcaggccagatggcccctccactggggaat 
10 gctcccagctgtgctgtggagagaagctgatgttttggtgtattgtcagc 
catcgtcctgggactcggagactatggcctcgccttcccacccttctctt 
ggaattacaaagccctggggtttgaactgactttatagcttgcaagtgta 
tctccttttatctggtgcctcctcaaacccagtcttcaaacactaaatgc 
agacaacaccttcttctgcaaacaccctggacttgacccaaggaggccct 
15 ggggaggccctaggggagcaccgtgatgagaggacagagcaggggctcca 
gcaccttctttctggactggcgttcacctccctgctcagtgcttgggctc 
cacgggcaggggtcagagcactccctaatttatgtgctatataaatatgt 
cagatgtacatagagatctattttttctaaaacattcccctccccactcc 
tctcccacagagtgctggactgttccaggccctccagtgggctgatgctg 
20 ggacccttaggatggggctcccagctcctttctcctgtgaatggaggcag 
agacctccaataaagtgccttctgggctttttccaaaaaaaaaaaaaaaa 
aaaaaaaaa 

SEP ID NO:5 - ELF3 intron 4 
gtgagaacccgttttctccttcctt^ 
25 ctccccagacttcttcctcctt^ 

ctggtcccctgagccctctctgagttctcacctcctcttcccag 

SEP ID NO:6 - ELF3 intron 5 

gtgagagctctctctgggccacaacctcccttccccgaagtgtcccttgttccctctggrt 
gccttctggc 

30 aggaacaggaacaggctgggaagtgtgtcctgagagccagcagcgtggttgaacagaaggtgggccggcagg 
gacttactctgaccccgccccccag 
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SEP ID NO: 7 - ELF3 intron 6 

gtgagtcgagggaggtccecaag^ 

ctgtagaggggctactctccaaactccctt^ 

SEP ID NO:8 - ELF3 intron 7 * 

gtgagctccgggggcacgtgggtcrtrc^ 

atccmctcccgmtctctggccte^^ 

accacctaattgcagagcctggcttggtg^^ 

gtaccagagtcctgtccaagactccagtgtcctgtccactgartccagttctct^ 

ctgggacaccctcaatgtgaggaggcagctgg^ 

ccgcacagagggcactgcggggtctctggagagg^ 

gg ca gggttgg<*ggccttgggtgagaggggacacctggatggca^^ 

SEP ID NP:9 - ELF3 intron 8 
gtgagctggcggccaggaccctcacgatacagrc^ 

ggctgcc^rttggtttmgcaacagggctgagtccttagagtgaggacaacatctgg^ 

aatgacaacatggagaaagtattagcctggcagacagcagacara^ 

atcgcctgtgaggcttgtcctcaggaaggcac^ 

cctggctttctcctcgttgaagcacttagg tig LLLUctctgggcctcagtttcctcctgtgtccaggagtacactagat 

catcttaagatcccgtccagccctaaaatcat^ 

cgttggccaggccggtctcaaactcctggcc*^ 

tgtgagccaccgtgcccagctccctggccttaaaagtcatgtaamaatgatcagacccca 

acaaagaagcaaaggcaaagagccctgtgtcctggg 

ccagtgtgaagaacagaggagmaggaagtgtgagtcaggrtc^^ 

gttacctgggggtaacgcgggcqaggtgggcgggrtggcagc^ 

atccttctcttcacccag 

SEP ID NP: 10 - ELF3 primary transcript - numbering as in SEQ ID NP:1. 
ttta gagccgggta ggggagcgca 

4801 gcggccagat acctcagcgc tacctggcgg aactggattt ctctcccgcc tgccggcctg 
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4861 cctgccacag ccggactccg ccactccg INTRONl> gt aggattcccc gcctgtcatt 
ccctagccca 

4921 gctcttggga aactgcagag gggtccagag gatttgcagt tctgaacctg cacactccag 
4981 tctaggatct ccgagcaaga gcgtaggtgt cctgagggtc aaagaacaga gagagattgt 
5041 ctctgggaag gcagaatggc catgacgccg ctagtctggc tccagggccc cagagatctg 
5101 aggagggaag cccagctgga ggctcctgtg gtcctgccct ggtctgagat cttggagccc 
5161 ttcttgaaga gacggtgtcc gcagagttgc tgatcttcct gcccctgggg gctactcttg 
5221 cccagggttg ggcaaagcag agtagctggg agtgtaagga gaggaccctc gtcccctcac 
5281 caacctcatc ctctctcccc ctacccacag EXON2> gtagcctc START CDS> at 

ggctgcaacc tgtgagatta 

5341 gcaacatttt tagcaactac ttcagtgcga tgtacagctc ggaggactcc accctggcct 
5401 ctgttccccc tgctgccacc tttggggccg atgacttggt actgaccctg agcaaccccc 
5461 agatgtcatt ggagggtaca INTRON2> ggtgggtctc agcggggtgg gatggggcac 

ggagtgggag 

5521 acagatccat ctaagggcct gttagacaaa tgggggaata ggcagggagg agggtctcta 
5581 ggcaaattcc agggctagag gctgagactt agtgactgag gtgctggggg ttgtggggct 
5641 gtgacaggca gagggaggtg tcagatacca ggacaagggt gttgtgaatg ctacctcctg 
5701 cccctactct tgggatggct ccaagggctg aggtgtgaat ccccagtgtg ctccaggaat 
5761 ggggctgtgt gggctgggag tggtggctca cgcctgtaat cccagcactt tgggaggctg 
5821 agctgagcgg atcacctgag gtcaagagtt cgagaccagc ctagccaaca tggtgaaacc 
5881 ccgtctctac taaaaataca aaaaaaaatt tatcccagcg tggtggtggg cacctataat 
5941 cccagctact ggggaggctg acgcaggagt atcgcttgaa cctgggaggt ggaggttgct 
6001 gtgagccgag attgtgccat tgcaccccag cctaggtgac aggagtgaga ctccatctca 
6061 aaaaaaaaaa aaaaaatggg gctgtaaggt ctgctgggtg gcctgagctg agcctgtttc 
6121 cctgcctggc ccttgcag EXON3> ag aaggccagct ggttggggga acagccccag 
ttctggtcga 

6181 agacgcaggt tctggactgg atcagctacc aagtggagaa gaacaagtac gacgcaagcg 
6241 ccattgactt ctcacgatgt gacatggatg gcgccaccct ctgcaattgt gcccttgagg 
6301 agctgcgtct ggtctttggg cctctggggg accaactcca tgcccagctg cgagacctca 
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6361 INTRON3> gtgagtccag gcccctggag gctggggagc agctccacat gttgagctga 
gtcgagttca 

6421 gtgtggccgt aggcaggccc tggagctctg ggccagctgc acagccagag agagcccttg . 
6481 agggagggat taggggagtg tgacccttcc ttccttcctt gtcag EXON4> cttcc 
5 agctcttctg 

6541 atgagctcag ttggatcatt gagctgctgg agaaggatgg catggccttc caggaggccc 
6601 tagacccagg gccctttg INTRON4> gt gagaacccgt tttctccttc cttccccagc 
ctgtcttgtc 

6661 ccatccctgc ccctccacag agtgctagag atgaccccct ccccagactt cttcctccct 
10 6721 caattagaaa aattgcagcaggtcatcaga cccatgggca gcatcacctg tcctggtctg 
6781 gtcccctgag ccctctctga gttctcacct cctcttccca g EXON5> accagggca 
gcccctttgc 

6841 ccaggagctg ctggacgacg gtcagcaagc cagcccctac caccccggca gctgtggcgc 
6901 aggagccccc tcccctggca gctctgacgt ctccaccgca g INTRON5> gtgagagct 
15 ctctctgggc 

6961 cacaacctcc cttccccgaa gtgtcccttg ttccctctgg ctcccagcac cataactcag 
. 7021 gccttctggc aggaacagga acaggctggg aagtgtgtcc tgagagccag cagcgtggtt 
7081 gaacagaagg tgggccggca ggggacttac tctgaccccg ccccccag EXON6> gg 
actggtgctt 

20 7141 ctcggagctc ccactcctca gactccggtg gaagtgacgt ggacctggat cccactgatg 

7201 gcaagctctt ccccagcg INTRON6> gt gagtcgaggg aggtccccaa gagggcgtcc 
catttagcaa 

7261 tgcacagggg gcccggctct tcctgcagcc ttttcctgta gaggggctac tctccctaac 
7321 tcccctcttg cccctccttg accttccacc accgtcccca cag EXON7> atggttl 
25 tcgtgactgc 

7381 aagaaggggg atcccaagca cgggaagcgg aaacgaggcc ggccccgaaa gctgagcaaa 
7441 gagtactggg actgtctcga gggcaagaag agcaagcacg INTRON7> gtgagctccg 
ggggcacgtg 

7501 ggtcctccct gcgccgggct gagcggcttc ctggggcact gcgggttgtt gcaggtatcc 
30 756 1 cttctcccgt tttctctggc ctccgcatgg cctttggtaa ggctgtgcac aagctggggg 
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7621 ctctatggta tcggtcacca cctaattgca gagcctggct tggtggtcct ggagaggagg 
7681 aggaaataag gctcccagtg ggaggctcat ggtaccagag tcctgtccac tgactccagt 
7741 gtcctgtcca ctgactccag ttctctctgc acttggccac tgtcctgccc tctgggacac 
7801 cctcaatgtg aggaggcagc tggtgggta taggtgggct gaggagaaaa gcagtcactg 
7861 cagtacccgc acagagggca ctgcggggtc tctggagagg cttgctgcat gctgtggcca 
7921 agtcagcagt gcactggggc gggcagggct ggctggcctt gggtgagagg ggacacctgg 
7981 atggcaaact gatggaggct ggccttgcag EXON8> cgcccagagg cacccacctg 
tgggagttca 

8041 tccgggacat cctcatccac ccggagctca acgagggcct catgaagtgg gagaatcggc 
8101 atgaaggcgt cttcaagttc ctgcgctccg aggctgtggc ccaactatgg ggccaaaaga 
8161 aaaagaacag caacatgacc tacgagaagc tgagccgggc catgag INTRON8> gtga 
gctggcggcc 

8221 aggaccctca cgatacagcc ggacatgggg acaggcgctc acactcccac cgccctcttt 
8281 ctggctgcca cttggtttct tgcaacaggg ctgagtcctt agagtgagga caacatctgg 
8341 gttggtctac ttcatggatt aaatgacaac atggagaaag tattagcctg gcagacagca 
8401 gacacagtgc acttgagcta gcagcaacat ttcttgtatc gcctgtgagg cttgtcctca 
8461 ggaaggcacc tggagagtgg gaaagggggc aggagccgtg cccacccagg gcctggcttt 
8521 ctcctcgttg aagcacttag gttgtttttc tctgggcctc agtttcctcc tgtgtccagg 
8581 agtacactag atcatcttaa gatcccgtcc agccctaaaa tcatgtactt actttttttt 
8641 tctttttctt ttttaaatag aggcaagggt ctctacgttg gccaggccgg tctcaaactc 
8701 ctggcctcaa atgactctcc tgcctcggcc tctcaaagtg ctgggattac aggtgtgagc 
8761 caccgtgccc agctccctgg ccttaaaagt catgtaattt aatgatcaga ceccagtcac 
8821 agccatagga tacaaagaag caaaggcaaa gagccctgtg tcctgggcac ggttacaggc 
8881 cagtgtaggg aaagagcttc tgcttgccag tgtgaagaac agaggagttt aggaagtgtg 
8941 agtcaggctc agcttagtca ggcagagacc agtgggcatg ggttacctgg gggtaacgcg 
9001 ggccaggtgg gcgggctggc agcctggggc ccatttcctg ccaaagcacc tctgaccatc 
9061 cttctcttca cccag EXON9> gtact actacaaacg ggagatcctg gaacgggtgg 
atggccggcg 

9121 actcgtctac aagtttggca aaaactcaag cggctggaag gaggaagagg ttctccagag 
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9181 tcggaactga END CDS gggttggaac tatacccggg accaaactca cggaccactc 
gaggcctgca 

9241 aaccttcetg ggaggacagg caggccagat ggcccctcca ctggggaatg ctcccagctg 
9301 tgctgtggag agaagctgat gttttggtgt attgtcagcc atcgtcctgg gactcggaga 
5 9361 ctatggcctc gcctccccac cctcctcttg gaattacaag ccctggggtt tgaagctgac 
9421 tttatagctg caagtgtatc tccttttatc tggtgcctcc tcaaacccag tctcagacac 
9481 taaatgcaga caacaccttc ctcctgcaga cacctggact gagccaagga ggcctgggga 
9541 ggccctaggg gagcaccgtg atggagagga cagagcaggg gctccagcac cttctttctg 
9601 gactggcgtt cacctccctg ctcagtgctt gggctccacg.ggcaggggtc agagcactcc 
10 9661 ctaatttatg tgctatataa atatgtcaga tgtacataga gatctatttt ttctaaaaca 
9721 ttcccctccc cactcctctc ccacagagtg ctggactgtt ccaggccctc cagtgggctg 
9781 atgctgggac ccttaggatg gggctcccag ctcctttctc ctgtgaatgg aggcagagac 
9841 ctccaataaa gtgccttctg ggctttttct a 

SEP ID NO:ll - 531bp GC3 DNA sequence isolated from modified RDA. GC3 is 
15 located within intron 7 and extends to exon 8 of the ELF3 gene between 7514 
to 8045 (using SEQ ID NO:l numbering). The GC3 primers are in bold, the 202 
bp GC3 fragment amplified by GC3 primers are underlined. 
CCGGGCTGAGCGGCTTCCTGGGGCACTGCGGGTTGTTGCAGGTATCCCCTCTC 
CCGTTTCCTCTGGCCTCCGCATGGCCTTTGGTAAGGCTGTGCACAAGCTGGGGG 
20 CTCTATGGTATCGGTCACCACCTAATTGCAGAGCCAGGCTTGGTGGTCCTGGAG 
AGGAGGAGGAAATAAGGCTCCCAGTGGGAGGCTCATGGTACCAGAG TCCTGTC 
CACTGACTCGAGTGTCCTGTCCACTGACTCCAGTTCTCTCTGCACTTGGCCACT 
GTCCTGCCCTCTGGGTCACCCTCAATGTGAGGAGGCGGCTGGTGGGTCTTAGG 
TGGGCTGAGGAGAAAAGCAGTCACTGCAGTACCCGCACAGAGGGCACTGCGGG 
25 GTCTCTGGAGAGGCTTGCTGCATGCTGTGGCCAAGT CAAGCAGTGCACTGGG 
GCGGCAGGGCTGGCTGGCCTTGGGTGAGAGGGGGCACCTGGATGGCAAACGG 
ATGGAGGCTGGCTTGCAGCGCCCAGAGGCACCCACCTGTGGGAGTTCATCCGG 
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SEQ ID NO; 12 - 1002 bp unspliced mRNA of the ELF3 gene (from 6550 to 7551 

of the ELF3 gene, using SEQ ID NO:l numbering) in human breast tumor cell 

lines. The unspliced entire intron 4, intron 5, intron 6 and 5' portion of intron 7 

are underlined. The intron/exon splice junction borders are in bold. 

GTTGGATCATTGAGCTGCTGGAGAAGGATGGCATGGCCTTCCAGGAGGCCCTA 

GACCCAGGGCCCTTT GGTGAGAACCCGTTTTCTCCTTCCTTCCCCAGCCTGTCT 

TGTCCCAT CCCTGCCCCTCCACAGAGTGCTAGAGATGACCCCCTCCCCAGACTT 

CTTCCrCCC TCAATTAGAAAAATTGCAGCAGGTCATCAGACCCATGGGCAGCAT 

CACCTGTC CTGGTCTGGTCCCCTGAGCCCTCTCTGAGTTCTCACCTCCTCTTrGC 

AGACCAGGGCAGCCCCTTTGCCCAGGAGCTGCTGGACGACGGTCAGCAAGCCA 

GCCCCTACCACCCCGGCAGCTGTGGCGCAGGAGCCCCCTCCCCTGGCAGCTCT 

GACGTCTCCACCGCAG GTGAGAGCTCTCTCTGGGCCACAACCTCCCTTCCCrG 

AAGTGTCCCTTGTTCCCTCTGGCTCCCAGCACCATAACTCAGGCCTTCTGGCAG 

GAACAGGAACAGGCTGGG AAGTGTGTCCTGAGAGCCAGCAGCGTGGTTGAACA 

GAAGGTGGGCCGGCAGGG GACTTACTCTGACCCCGCCCCCCAGG GACrnGTa 

CTTCTCGGAGCTCCCACTCCTCAGACTCCGGTGGAAGTGACGTGGACCTGGATC 

CCACTGATGGCAAGCTCTTCCCCAGC GGTGAGTCGAGGGAGGTCCCCAAGAGG 

GCGTCCCATTTAGCAATGCACAGGGGGCCCGGCTCTTCCTGCAGCCTTTTCCTG 

TAGAGGGGCTACTCTCCCTAACTCCCCTCTTGCCCCTCCTTGACCTTCGACCACC 

GTCCCCACA^ATGGTTTTCGTGACTGCAAGAAGGGGGATCCCAAGCACGGGAA 

GCGGAAACGAGGCCGGCCCCGAAAGCTGAGCAAAGAGTACTGGGACTGTCTCG 

AGGGCAAGAAGAGCAAGCAC GGTGAGCTCCGGGGGCACGTGGGTCCTCCCTG 

CGCCGGGCTGAGCGGCTTCCTGGGGCACTGCGGGTTGTTG 

SEP ID NO:13 - An Alu^j - the bold letters indicates a 17 bp sequence located 
in the end of Alu sequence that repeats nucleotides 8746 to 8762 of ELF3 
sequence 

GTATGCTTGGCC1 11 1'ClUlT'lTCITCTr Cl 1C1 1 111A TTTTTCGAGACAGGGTC 

TCGCTCTGTCACCCAGGTTAGAGTGCAGTGGCACAATCTTGGCTCGCTACAACC 

TCTGCCTGCCGGGTTCAAGTGATTCTTGTGCCTCAGCCTCCAAGTAGCTGGGAT 
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TACAGGCACCTGC^CCATGCCCAGCTAATITITGTATmTAGTAGAGACGGG 
GGTTTCACCATGTTGGCTAGGCTGGTCTCGAACTCCTGACCTCAAGTGATCCGC 
CCGCCTCAGCCTCCCAAAGTGCTGGAATTACAGGTGTGAGCGA 

SEP ro NO: 14 - An antisense insertion of a 315 bp Alu^,, sequence in an ELF3 
sequence. This sequence shows the ELF3 gene from nt 8685 to 9107 
(numbering of SEQ ID NO:l) containing a 315 bp antisense insertion of the 
AtiW sequence. Underlined letters represent the 315 bp Alu,^,, sequence, the 
bold letters indicates a 17 bp sequence located in the end of Alu kwd sequence 
that repeats nt 8746 to 8762 of the ELF3 sequence. The bracketed numbers 
shows the insert point of the Alu^ in the ELF3 DNA sequence. 

GGCCGGTCTCAAACTCCTGGCCTCAAATGACTCTCCTGCCTCGGCCTCTCAAAG 
TGCTGGGATTACAGGTGTGAGCCAr87621 GTATGCTTGGC C ' l I ' l ' l C ' l ' l ' l ' lT I 'CTTT 
CTTCl^GTU'fTTATTTTTC GAGACAGGGTCTCGCrCTGTCACCCAGGTTAGAGTG 
CAGTGGCACAATCTTGGCTCGCrACAACCTCTGCCTGCCGGGTTCAAGTGATrC 
TTGTGCCTC AGCCTCCAAGTAGCTGGGATTACAGGCACCTGCCACCATGCCCAG 
CTAA A'IT'1'1 GTATTTTTAGTAGAGACGGGGGTTTC ACC ATGTTGGCTAGGCTGG 
TCTCGAAC TCCTGACCTCAAGTGATCCGCCCGCCTCAGCCTCCCAAAGTGCTGG 
AATTACAG GTGTGAGCGA rft7631 C.C.CVT(^.r.C.ACU^TrrrTnrzrr^rT a a A AGT^A 

TGTAATTTAATGATCAGACCCCAGTCACAGCCATAGGATACAAAGAAGCAAAGG 

CAAAGAGCCCTGTGTCCTGGGCACGGTTACAGGCCAGTGTAGGGAAAGAGCTT 

CTGCTTGCCAGTGTGAAGAACAGAGGAGTTTAGGAAGTGTGAGTCAGGCTCAG 

CTTAGTCAGGCAGAGACCAGTGGGCATGGGTTACCTGGGGGTAACGCGGGCCA 

GGTGGGCGGGCTGGCAGCCTGGGGCCCATTTCCTGCCAAAGCACCTCTGACCA 

TCCTTCTCTTCACCCAGGTACTACTACAAACGGGAGATCCTGGAACGGG 

SEP ID NO:15 - the sequence of the novel ELF3 5' UTR. 

ctccgccactccggt^gatlccccgcctgtcattccctagcccagctcttgggaaactgcagaggggtccagagga 
tttgcagttctgaacctgcacactccagtctaggatctccgagcaagagcgtagcctc 
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SEQ ID NO:16 - GC3 sense primer - codons 7722-77 r 41 of the ELF3 gene. 
CCTGTCCACTGACTCCAGTG 

SEQ ID NO;17 - GC3 antisense primer - codons 7923-7905 of the ELF3 gene. 
ACTTGGCCACAGCATGCAG 

5 SEQ ID NO:18 - GC3 UPF antisense primer - codons 7572-7598 of the ELF3 
gene. 

ACCAAAGGCCATGCGGAGGCCAGAGAA 

SEQ ID NG:19 - GC3 UPN antisense primer - codons 7523-7551 of the ELF3 
gene. 

10 CAACAACCCGCAGTGCCCCAGGAAGCCC 

SEQ ID NQ:20 - GC3 DF sense primer - codons 7943-7970 of the ELF3 gene. 
GCAGGGCTGGCTGGCCTTGGGTGAGAGG 

SEP ID NO:21 - GC3 DN sense primer - codons 8004-8030 of the ELF3 gene. 
CTTGCAGCGCCCAGAGGCACCCACCTG 

15 SEQ ID NO:22 - GC3 (1-3) sense primer - codons 4819-4843 of the ELF3 gene. 
GCTACCTGGCGGAACTGGATTTCTC 

SEP ID NO:23 - GC3 (1-3) antisense primer - codons 6240-6216 of the ELF3 
gene. 

CGCTTGCGTCGTACTTGTTCTTCTC 

20 SEQ ID NG:24 - GC3 (3-6) sense primer - codons 6180-6205 of the ELF3 gene. 
AAGACGCAGGTTCTGGACTGGATCAG 
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SEO ID NO:25 - GC3 (3-6) antisense primer - codons 7194-7171 of the ELF3 
gene. 

TGGGATCCAGGTCCACGTCACTTC 

SEP ID NO:26 - GC3 (6-8) sense primer - codons 7155-7179 of the ELF3 gene. 
5 TCCTCAGACTCCGGTGGAAGTGACG 

SEP ID NO:27 - GC3 (6-8) antisense primer - codons 8109-8174 of the ELF3 
gene. 

CCGGCTCAGCTTCTCGTAGGTCATG 

SEP ID NQ:28 - GC3 (8-9) sense primer - codons 8065-8089 of the ELF3 gene. 
10 AGCTCAACGAGGGCCTCATGAAGTG 

SEP ID NG:29 - GC3 (8-9) antisense primer - codons 9352-9327 of the ELF3 
gene. 

TCCCAGGACGATGGCTGACAATACAC 

SEP ID NP:30 - p-actin ES31 primer 
15 CCCCAGCCATGTACGTTGCTATCC 

SEP ID NP:31 - p-actin ES33 primer 
GCCTCAGGGCAGCGGAACCGCTCA 

SEP ID NG:32 - GC3DD sense primer - codons 8569-8596 of the ELF3 gene. 
CCTGTGTCCAGGAGTACACTAGATCATC 

20 SEP ID NP:33 - INSE sense primer - codons 8659-8680 of the ELF3 gene. 
AGAGGCAAGGGTCTCTACGTTG 
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SEO ID NO:34 - INSE antisense primer - codons 8774-8795 of the ELF3 gene. 
TC CCTGGCCTTAAAAGTC ATGT 



/ 
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What is claimed is: 

I. An isolated cDNA of a mammalian ELF3 gene, or fragment thereof at 
least 20 nucleotides long, comprising at least one intron of the ELF3 gene or a 
portion of an intron of the ELF3 gene. 

5 2. The cDNA of claim 1, wherein the intron is selected from the group 

consisting of intron 4, intron 5, intron 6, intron 7, intron 8, and combinations 
thereof. 

3. The cDNA of claim 1, wherein the intron is intron 4. 

4. The cDNA of claim 3, wherein the intron 4 comprises SEQ ID NO:-5. 
10 5. The cDNA of claim 1, wherein the intron is intron 5. 

6. The cDNA of claim 5, wherein the intron 5 comprises SEQ ID NO:6. 

7. The cDNA of claim 1, wherein the intron is intron 6. 

8. The cDNA of claim 7, wherein the intron 6 comprises SEQ ID NO:7. 

9. The cDNA of claim 1, wherein the intron is intron 7. 

15 10. The cDNA of claim 1, wherein the intron 7 comprises SEQ ID NO:8. 

II. The cDNA of claim 1, wherein the intron is intron 8. 

12. The cDNA of claim 1, wherein the intron 8 comprises SEQ ID NO:9- 
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13. The cDNA of claim 1, wherein the cDNA also comprises SEQ ID 
NO:13. 

14- The cDNA of claim 13, wherein the SEQ ED NO: 13 is within intron 8, 

15. The cDNA of claim 14, wherein the SEQ ID NO: 13 is between 
5 nucleotides 8762 and 8763 using the numbering of SEQ ID NO:l. 

16. The cDNA of claim 1, comprising SEQ ID NO:ll. 

17. The cDNA of any one of claims 1-16, wherein the ELF3 gene 
comprises a nucleotide sequence encoding the amino acid sequence of SEQ ID 
NO:3orSEQIDNO:4. 

10 18. The cDNA of any one of claims 1-17, wherein the cDNA comprises 

the entire ELF3 gene coding region. 

19. The cDNA of claim 1, comprising introns 4, 5, 6 and 7 of the ELF3 

gene. 

20. The cDNA of any one of claims 1-19, wherein the ELF3 gene 
15 comprises the nucleotide sequence of SEQ ID NO: 15. 

21. The cDNA of claim 20, comprising SEQ ID NO:2, wherein the cDNA 
may be interspersed by one or more introns. 

22. The cDNA of any one of claims 1-21, wherein the cDNA was 
prepared from a composition comprising a cell, wherein the cell further 

20 comprises genomic DNA comprising an Alu^, wherein the Alu^ consists of 
SEQ ID NO:13. 
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23. The cDNA of claim 22, wherein the Alu^a is between nucleotides 
8762 knd 8763 of an ELF3 gene in the cell. 

24. The cDNA of any one of claims 1-23, wherein the cDNA was 

prepared from a composition comprising a cell, the cell obtained from a human 

5 patient being tested for breast cancer. 

§ 

25. The cDNA of claim 24, wherein the patient is at high risk for breast 
cancer. 

26. The cDNA of any one of claims 22-25, wherein the cell is a 
peripheral blood mononuclear cell. 

10 27. The cDNA of any one of claims 22-25, wherein the cell was obtained 

from a tissue biopsy. 

28. The cDNA of claim 27, wherein the tissue biopsy was a breast tissue 
biopsy. 

29. The cDNA of any one of claims 1-28, wherein the cDNA was 
15 prepared using RT-PCR. 

30. The cDNA of claim 29, wherein the cDNA was amplified using 
primers that are suitable for amplifying at least a portion of intron 4 of the ELF3 
gene. 

31. The cDNA of claim 29, wherein the cDNA was amplified using 

20 primers that are suitable for amplifying at least a portion of intron 5 of the ELF3 
gene. 
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32- The cDNA of claim 29, wherein the cDNA was amplified using 
primers that are suitable for amplifying at least a portion of intron 6 of the ELF3 
gene. 

33. The cDNA of claim 29, wherein the cDNA was amplified using 

5 primers that are suitable for amplifying at least a portion of intron 7 of the ELF3 
gene. 

34. A vector comprising the cDNA of any one of claims 1-33. 

>« 

35. A cell transfected with the vector of claim 34. 

36. The cell of claim 35, wherein the cell is a prokaryote. 
10 37. The cell of claim 35, wherein the cell is a eukaryote. 

38. The cell of claim 35, wherein the vector sequence comprising the 
cDNA is integrated into a chromosome of the cell. 

39. The celLof claim 35, wherein the vector autonomously replicates in 
the cell. 

15 40. The cell of any one of claims 35-39, wherein the cDNA comprises the 

entire ELF3 gene coding region. 

41. A set of two primers, each less than 30 nucleotides in length, 
wherein each primer is homologous to a portion of an ELF3 gene, and (a) 
wherein at least one primer is homologous to a portion of an intron of the ELF3 
20 gene or (b) wherein each primer is homologous to a portion of different exons 
of the ELF3 gene. 
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42. The set of two primers of claim 41, wherein the intron of the ELF3 
gene is selected from the group consisting of intron 4, intron 5, intron 6, intron 
7 and intron 8. 

43. The set of two primers of claim 41, wherein the intron of the ELF3 
5 gene is intron 8. 

44. The set of two primers of claim 43, wherein one of the two primers 
is homologous to a region of an ELF3 gene 5* to nt8762 of the ELF3 gene, and 
the other of the two primers is homologous to a region of the ELF3 gene 3 ! to nt 
8763 of the ELF3 gene. 

10 45. A set of two primers, each less than 30 nucleotides in length, 

wherein at least one primer is at least 95% homologous to SEQ ED NO: 13. 

46. The set of two primers of claim 45, wherein both primers are at least 
95% homologous to SEQ ID NO: 13. 

47. The set of two primers of claim 45, wherein both primers are 
15 homologous to SEQ ID NO: 13. 

48. The set of two primers of claim 45, wherein one primer is 
homologous to SEQ ID NO:13 and the other primer is homologous to a portion 
of an ELF3 gene. 

49. The set of two primers of claim 45, wherein one primer is 

20 homologous to SEQ ID NO: 13 and the other primer is homologous to an intron 
of an ELF3 gene. 
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50. The set of two primers of claim 49, wherein the other primer is 
homologous to intron 8 of the ELF3 gene. 

51. A set of two primers, each less than 30 nt in length, wherein the set 
of primers is suitable for amplifying an ELF3 5 1 UTR that is at least 95% 

5 homologous to SEQ ID NO:15. 

52. The set of two primers of claim 51, wherein one primer is 
homologous to SEQ ED NO:15 and the other primer is homologous to an ELF3 
gene. 

53. An isolated nucleic acid or mimetic between about 20 nucleotides 
10 and about 5,000 nucleotides long comprising a sequence homologous to at least 

a portion of an intron of a human ELF3 gene. 

54. The isolated nucleic acid or mimetic of claim 53, wherein the intron 
of the human ELF3 gene is selected from the group consisting of intron 4, intron 
5, intron 6, intrdii 7 and intron 8. 

15 55. The isolated nucleic acid or mimetic of claim 53, wherein the intron 

is intron 4. 

56. The isolated nucleic acid or mimetic of claim 55, wherein the 
sequence comprises an entire intron 4 or its complement. 

57. The isolated nucleic acid or mimetic of claim 53, wherein the intron 
20 is intron 5. 

58. The isolated nucleic acid or mimetic of claim 57, wherein the 
sequence comprises an entire intron 5 or its complement. 
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59. The isolated nucleic acid or mimetic of claim 53, wherein the intron 
is intron 6. 

60. The isolated nucleic acid or mimetic of claim 59, wherein the 
sequence comprises an entire intron 6 or its complement. 

5 61. The isolated nucleic acid or mimetic of claim 53, wherein the intron 

is intron 7. 

62. The isolated nucleic acid or mimetic of claim 61, wherein the 
sequence comprises an entire intron 7 or its complement. 

63. An isolated nucleic acid or mimetic between about 20 nucleotides 
10 and about 5000 nucleotides long comprising a sequence at least 95% 

homologous to SEQ ID NO:13 or its complement. 

64. The isolated nucleic acid or mimetic of claim 63, wherein the 
sequence is homologous to at least 50 nucleotides of SEQ ID NO:13. 

15 65. The isolated nucleic add or mimetic of claim 63, wherein the 

sequence is homologous to the entire SEQ ID NO:13. 

66. The isolated nucleic acid or mimetic of claim 63, wherein the 
sequence further comprises a second sequence homologous to at least 10 
consecutive nucleotides of SEQ ED NO: 10 or its complement. 

20 67. The isolated nucleic acid or mimetic of claim 66, wherein the second 

sequence is homologous to at least 10 consecutive nucleotides of SEQ ID NO:9 
or its complement. 
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68, The isolated nucleic acid or mimetic of claim 66, wherein the second 
sequence is homologous to at least 10 consecutive nucleotides immediately 5' to 
nucleotide 8763, or immediately 3' to nucleotide 8762 of the ELF3 gene, using 
the ELF3 gene sequence numbering of SEQ ID NO:l. 

5 69. The isolated nucleic acid or mimetic of claim 63, comprising a 

nucleotide sequence homologous to at least nucleotides 8752 to 8773 of SEQ ID 
NO: 1, with SEQ ID NO:13 inserted between nucleotides 8762 and 8763. 

70. An isolated nucleic acid or mimetic between about 20 nucleotides 
and about 5000 nucleotides long comprising a nucleotide or mimetic sequence 

10 at least 95% homologous to SEQ ED NO:15. 

71. The isolated nucleic add or mimetic of claim 70, wherein the 
nucleotide or mimetic sequence is homologous to SEQ ID NO: 15. 

72. The isolated nucleic acid or mimetic of claim 70 or 71, further 
comprising at least 20 nucleotides or mimetics of the 5 l end of a sequence 

15 encoding SEQ ID NO:3 or SEQ ID NO:4 or its complement adjoining the 3 1 end 
of the nucleotide sequence. 

73. The isolated nucleic acid or mimetic of any one of claims 53-72, 
wherein the nucleotide or mimetic sequence is DNA. 

74. The isolated nucleic acid or mimetic of any one of claims 53-72, 
20 wherein the nucleotide or mimetic sequence is RNA. 

75. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is less than 2000 nucleotides long. 

204024.1 



-97- 

76. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is less than 1000 nucleotides long. 

77. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is less than 500 nucleotides long. . . 

5 78. The isolated nucleic add or mimetic of any one of claims 53-74, 

wherein the sequence is less than 200 nucleotides long. 

79. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is less than 100 nucleotides long. 

80. The isolated nucleic acid or mimetic of any one of claims 53-74, 
10 wherein the sequence is more than 100 nucleotides long. 

81. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is more than 200 nucleotides long. 

82. The isolated nucleic acid or mimetic of any one of claims 53-74, 
wherein the sequence is more than 500 nucleotides long. 

15 83. The isolated nucleic acid or mimetic of any one of claims 53-74, 

wherein the sequence is more than 1000 nucleotides long. 

84. A vector comprising the sequence of any one of claims 53-83. 

85. A cell comprising the vector of claim 84. 

86. A probe comprising the isolated nucleic acid or mimetic of any one 
20 of claims 53-83, the probe further comprising a detectable label. 
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87. The probe of claim 86, wherein the detectable label is fluorescent or 
radioactive. 

88. A pair of cell cultures, where each cell culture is of the same tissue 
type and is derived from a biopsy of cancerous mammalian tissue, and where 

5 one of the cell lines is of cancerous cells and the other cell line is of matched 
noncancerous cells. 

89. The pair of cell cultures of claim 88, wherein the mammalian tissue 
is breast tissue. 

90. The pair of cell cultures of claim 88 or 89, wherein the cells are 
10 myofibroblast cells or CD4+ lymphocytes. 

91. A method for detennining whether a patient has cancer or is at risk 
for cancer, the method comprising evaluating whether a cell in the patient 

. comprises a nucleic acid sequence selected from the group consisting of an ELF3 
mRNA retaining at least a portion of an intron, a sequence at least 95% 
15 homologous to SEQ ID NO: 15, and an Alu^, wherein a patient comprising at 
least one of those sequences has cancer or is at risk for cancer., 

92. The method of claim 91, wherein the sequence is ELF3 mRNA 
retaining at least a portion of an intron. 

93. The method of claim 91, wherein the sequence is SEQ ID NO: 15. 
20 94. The method of claim 91, wherein the sequence is Alu^. 

95. The method of claim 94, wherein the sequence is SEQ ID NO: 13. 
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96. The method of any one of claims 91-95, the method further 
comprising a polymerase chain reaction. 

97. The method of claim 96, wherein the method comprises reverse 
transcriptase-polymerase chain reaction. 

5 98. The method of any one of claims 91-95, the method further 

comprising a northern hybridization or a Southern hybridization. 

99. The method of any one of claims 91-95, the method further 
comprising sequencing the nucleic add sequence. 

100. The method of any one of claims 91-99, wherein the cell is a 

10 PBMC. 

101. The method of any one of claims 91-99, wherein the cell is from a 
tissue biopsy, 

102. The method of claim 101, wherein the tissue is breast tissue. 

103. The method of claim 101 or 102, wherein the cell is from a tissue 
15 effusion. 

104. A kit for evaluating whether a patient has cancer or is at risk for 
cancer, the kit comprising 

a. the set of two primers of any one of claims 41-52, and 

b. instructions directing the use of the primers for determining 
2b whether a portion of the ELF3 gene amplified by the two primers is present in a 

nucleic acid preparation. 
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105. The kit of claim 104, wherein each primer is homologous to a 
portion of the ELF3 gene or the complement of the ELF3 gene, and 

wherein at least one primer is homologous to a portion of an intron of 
the ELF3 gene, or 

wherein the two primers are homologous to portions of the ELF3 gene, 
or its complement, that flank an intron of the ELF3 gene. 

106. The kit of claim 105, wherein the intron of the ELF3 gene is 
selected from the group consisting of intron 4, intron 5, intron 6, intron 7 and 
intron 8. 

107. The kit of claim 105, wherein the intron of the ELF3 gene is intron 

8. 

108. The kit of claim 107, wherein one of the two primers is 
homologous to a region of an ELF3 gene 5' to nt 8762 of the ELF3 gene or its 
complement, and the other of the two primers is homologous to a region of the 
ELF3 gene 3' to nt 8763 of the ELF3 gene or its complement. 

109. A kit for evaluating whether a patient has cancer or is at risk for . 
cancer, the kit comprising 

a. at least one of 

i. the isolated nucleic acid or mimetics of any one of claims 

53-83, or 

ii. the probe of claim 86 or 87; and 

b. instructions directing the use of the nucleic add or mimetic for 
detennining whether a nucleic acid sequence homologous to the probe is 
present in a sample from the patient. 
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110. The kit of claim 109, wherein the isolated nucleic acid or mimetic i 
on a gene chip. 

111. The kit of claim 109 or 110, further comprising instructions to 
sequence any nucleic add sequence homologous to the probe that is present in 
the sample from the patient. 

1 12. A method for determining whether a cell or a sample comprises a 
virus, the method comprising 

(a) adding contents of the cell or the sample to a culture, where 
the culture comprises a susceptible cell that is capable of acquiring a 
characteristic upon infection with a virus, the characteristic selected from the 
group consisting of intron retention of ELF-3 mRNA, and acquisition of SEQ ID 
NO:13 in an ELF3 gene; and 

b. subsequently determining whether a susceptible cell has 
acquired the characteristic after addition of the contents of the cell. 

113. The method of claim 112, wherein the characteristic is intron 
retention of ELF-3 mRNA. 

114. The method of claim 112, wherein the characteristic is acquisition 
of SEQ ID NO:13 in an ELF3 gene. 

115. The method of any one of claims 112-114, wherein the susceptible 
cell is a BJAB cell. 



204024.1 



> 

-102- 
Abstract 

ELF3 gene compositions associated with cancer are provided, including 
ELF3 mRNA intron retention, a novel ELF3 5' untranslated region, and a novel 
Alu, Alu^. Methods and kits using primers or probes to detect the presence of 
these ELF3 gene compositions are also provided. Methods for determining 
whether a cell comprises a virus are also provided. 
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